Add a New SDK
This guide walks through adding support for a new programming language (e.g., Rust, C#, PHP, Ruby) to Pseudata. Use the existing implementations (Go, Java, Python, TypeScript) as reference.
Quick Summary
Section titled “Quick Summary”Much of the code is generated automatically from TypeSpec definitions. Your main task is to write the foundation code (Generator, SeedFrom, ID Utils) and implement the custom TypeSpec emitters to generate code for your target language.
What You’ll Implement:
- Generator - PCG32 random number generator
- SeedFrom - String to seed conversion
- TypeSpec Emitters - Code generation for your language
- PseudoArray - PseudoArray base class
- ID Utils - UUID encoding/decoding
- Primitives - ~75 data generation methods
- Models - Auto-generated via quicktype
Each component has fixture-based tests to ensure cross-language consistency.
Code Generation Flow
Section titled “Code Generation Flow”Pseudata uses a multi-stage code generation pipeline:
Stage 1: TypeSpec → JSON Schema
Section titled “Stage 1: TypeSpec → JSON Schema”tsp compile src # Generates JSON Schema in tsp-output/Output: tsp-output/@typespec/json-schema/*.json
User.json- User model schemaAddress.json- Address model schemaResources.json- Resources model schema
Stage 2: JSON Schema → Models (via quicktype)
Section titled “Stage 2: JSON Schema → Models (via quicktype)”quicktype --src-lang schema \ tsp-output/@typespec/json-schema/*.json \ -o ../sdks/rust/src/models.rs \ --lang rust \ --visibility public \ --derive-debugOutput: models.{go,java,py,ts,rs} - Data structures only
Stage 3: TypeSpec → Code (via custom emitters)
Section titled “Stage 3: TypeSpec → Code (via custom emitters)”# Automatically runs during tsp compile (configured in tspconfig.yaml)Custom Emitters (in typespec/lib/):
-
resource-emitter.js- Reads files from
typespec/resources/directory - Generates:
resources.{go,java,py,ts}with embedded data - Includes
AVAILABLE_LOCALESandRESOURCESmap
- Reads files from
-
interface-emitter.js(called by array-emitter.js)- Reads
primitives.tspinterface definition - Generates:
primitives.{go,java,py,ts}interfaces
- Reads
-
array-emitter.js- Reads
@arraydecorators on models - Generates:
arrays.{go,java,py,ts}with generator functions - Example:
UserArray,AddressArray,generateUser(),generateAddress()
- Reads
Configuration (tspconfig.yaml):
emit: - "@typespec/json-schema" # Stage 1: JSON Schema - "../lib/emitter.js" # Stage 3: Custom emittersComplete Generation Command
Section titled “Complete Generation Command”cd typespecnpm run generateThis runs:
tsp compile src- TypeSpec → JSON Schema + custom emittersquicktype- JSON Schema → Models for all languages- Post-processing (e.g.,
fix-resources-visibility.jsfor Java)
Directory Structure
Section titled “Directory Structure”sdks/ rust/ # Example: Adding Rust src/ generator.rs # PCG32 implementation seed_from.rs # String → seed conversion id_utils.rs # PseudoID encode/decode primitives.rs # Generated interface primitives_impl.rs # Implementation models.rs # Generated models arrays.rs # Generated arrays pseudo_array.rs # Generic array base resources.rs # Generated resources lib.rs # Public exports tests/ generator_test.rs # Fixture tests seed_test.rs # Fixture tests id_test.rs # Fixture tests primitives_vectors_test.rs # Fixture tests array_user_vectors_test.rs # Fixture tests array_address_vectors_test.rs # Fixture tests Cargo.toml README.mdImplementation Guide
Section titled “Implementation Guide”Implementation Sequence
Section titled “Implementation Sequence”Components should be implemented in order, as later components depend on earlier ones:
1. Generator (PCG32) - Core random number generator ↓2. SeedFrom - String to seed conversion ↓3. TypeSpec Emitters - Code generation for your language ├─ resource-emitter.js (embeds locale data) ├─ interface-emitter.js (generates Primitives interface) └─ array-emitter.js (generates array classes) ↓4. PseudoArray - PseudoArray wrapper ↓5. ID Utils - UUID encoding/decoding ↓6. Primitives Implementation - Implement generated interface ↓7. Models (via quicktype) - Auto-generated data structures ↓8. Array Fixture Tests - End-to-end verificationStep 1: Generator (PCG32 Algorithm)
Section titled “Step 1: Generator (PCG32 Algorithm)”What: Deterministic random number generator using PCG32 algorithm.
Reference:
- Go:
generator.go - Specification:
fixtures/pcg32_test_vectors.json
Key Methods:
// Example signatures (adjust to language idioms)struct Generator { state: u64, inc: u64,}
impl Generator { fn new(seed: u64, seq: u64) -> Self; fn advance(&mut self, delta: u64); fn next_int(&mut self) -> u32; // Core PCG32 output: uint32 fn intn(&mut self, n: u32) -> u32; // Uses u32 like Go fn probability(&mut self, p: f32) -> bool; fn next_float(&mut self) -> f32; fn next_bool(&mut self) -> bool; fn int_range(&mut self, min: u32, max: u32) -> u32; // Uses u32 like Go fn float_range(&mut self, min: f32, max: f32) -> f32; fn uuid(&mut self) -> String;}Implementation Requirements:
- PCG32 state: 64-bit state + 64-bit increment (must be odd)
- Advance: Jump to arbitrary position in O(1) time
- Bounded random:
intn(n)must use unbiased method (rejection sampling) - UUID v4: Generate RFC 4122 compliant UUIDs
- Unsigned semantics: PCG32 produces
uint32values. Languages without unsigned types (e.g., Java) should use wider types (long) to represent the full uint32 range
Test: fixtures/pcg32_test_vectors.json
- 12 test vectors covering various seeds, sequences, and advance operations
- Must match exact output sequences across all languages
Example Test Pattern:
#[test]fn test_generator_with_vectors() { let vectors: Vec<TestCase> = load_json("../../fixtures/pcg32_test_vectors.json");
for tc in vectors { let mut gen = Generator::new(tc.inputs.seed, tc.inputs.seq); gen.advance(tc.inputs.advance);
for (i, expected) in tc.expected.outputs.iter().enumerate() { assert_eq!(gen.next_int(), *expected, "Output {} mismatch", i); } }}Step 2: SeedFrom (String → Seed Conversion)
Section titled “Step 2: SeedFrom (String → Seed Conversion)”What: Deterministic conversion from strings to 64-bit seeds.
Reference:
- Go:
seed_from.go - Algorithm: FNV-1a hash
Key Function:
fn seed_from(s: &str) -> u64 { const OFFSET_BASIS: u64 = 14695981039346656037; const FNV_PRIME: u64 = 1099511628211;
let mut hash = OFFSET_BASIS; for byte in s.bytes() { hash ^= byte as u64; hash = hash.wrapping_mul(FNV_PRIME); } hash}Test: fixtures/seed_test_vectors.json
- 18 test cases including empty strings, ASCII, Unicode, long strings
- Must produce identical seeds across all languages
Critical: Handle UTF-8 correctly. The input is UTF-8 bytes, not characters.
Step 3: TypeSpec Emitters
Section titled “Step 3: TypeSpec Emitters”TypeSpec emitters are TypeScript programs that generate code for your language. They read TypeSpec definitions and generate Primitives interfaces, Arrays, and Resources.
Why Custom Emitters?
- Primitives: Interface only (implementation is hand-written)
- Arrays: Need generator functions with
@generatordecorator logic - Resources: Embed file data from
typespec/resources/directory - Quicktype only generates data models, not these specialized components
Emitter Architecture
Section titled “Emitter Architecture”// typespec/lib/emitter.js - Entry point called by tsp compileexport async function $onEmit(context) { await emitVirtualArrays(context); // → Primitives interface + Arrays await emitResourceData(context); // → Resources}Location: typespec/lib/
Three files to modify:
resource-emitter.js- Embed locale datainterface-emitter.js- Generate Primitives trait/interfacearray-emitter.js- Generate array classes + generator functions
Configuration (tspconfig.yaml):
emit: - "@typespec/json-schema" # Built-in: Models → JSON Schema - "../lib/emitter.js" # Custom: Everything else3.1 Resource Emitter
Section titled “3.1 Resource Emitter”Purpose: Read files from typespec/resources/ and embed as code
This generates the largest file - hundreds of names, cities, etc. per locale
Input: Directory structure
typespec/resources/ global/ email_domains.txt → ["gmail.com", "yahoo.com", ...] en/ nouns.txt → ["book", "table", ...] adjectives.txt verbs.txt months.txt → ["January", "February", "March", ...] weekdays.txt → ["Monday", "Tuesday", "Wednesday", ...] en_US/ given_male_names.txt → ["James", "John", "Robert", ...] given_female_names.txt → ["Mary", "Patricia", ...] family_names.txt cities.txt streets.txt zipcodes.txt US/ address_format.txt → "{street_address}, {locality}, {region} {postal_code}" street_format.txt timezones.txt → ["America/New_York", "America/Los_Angeles", ...] phone_number_patterns.txt → ["(###) ###-####", "###-###-####"]Your Task: Add Rust resource embedding
Add to resource-emitter.js:
// 1. Add Rust resource generationfunction emitRustResources(locales, program) { let rustCode = "use std::collections::HashMap;\n"; rustCode += "use once_cell::sync::Lazy;\n"; rustCode += "use crate::models::Resources;\n\n";
// Available locales constant rustCode += "pub static AVAILABLE_LOCALES: &[&str] = &[\n"; rustCode += locales.map((l) => ` "${l}"`).join(",\n") + ",\n"; rustCode += "];\n\n";
// Resources HashMap rustCode += "pub static RESOURCES: Lazy<HashMap<&'static str, Resources>> = Lazy::new(|| {\n"; rustCode += " let mut m = HashMap::new();\n\n";
for (const locale of locales) { const resources = loadResourcesForLocale(locale);
rustCode += ` m.insert("${locale}", Resources {\n`; rustCode += ` email_domains: vec![${formatRustStringArray(resources.emailDomains)}],\n`; rustCode += ` male_given_names: vec![${formatRustStringArray(resources.maleGivenNames)}],\n`; rustCode += ` female_given_names: vec![${formatRustStringArray(resources.femaleGivenNames)}],\n`; rustCode += ` other_given_names: vec![${formatRustStringArray(resources.otherGivenNames)}],\n`; rustCode += ` family_names: vec![${formatRustStringArray(resources.familyNames)}],\n`; rustCode += ` cities: vec![${formatRustStringArray(resources.cities)}],\n`; rustCode += ` streets: vec![${formatRustStringArray(resources.streets)}],\n`; rustCode += ` nouns: vec![${formatRustStringArray(resources.nouns)}],\n`; rustCode += ` adjectives: vec![${formatRustStringArray(resources.adjectives)}],\n`; rustCode += ` verbs: vec![${formatRustStringArray(resources.verbs)}],\n`; rustCode += ` zipcodes: vec![${formatRustStringArray(resources.zipcodes)}],\n`; rustCode += ` states: vec![${formatRustStringArray(resources.states)}],\n`; rustCode += ` address_format: "${resources.addressFormat}".to_string(),\n`; rustCode += ` street_format: "${resources.streetFormat}".to_string(),\n`; rustCode += ` timezones: vec![${formatRustStringArray(resources.timezones)}],\n`; rustCode += ` phone_number_patterns: vec![${formatRustStringArray(resources.phoneNumberPatterns)}],\n`; rustCode += ` });\n\n`; }
rustCode += " m\n"; rustCode += "});\n";
return rustCode;}
function formatRustStringArray(arr) { return arr.map((s) => `"${s.replace(/\\/g, "\\\\").replace(/"/g, '\\"')}".to_string()`).join(", ");}
// 2. Register Rust in emitResourceData()export async function emitResourceData(context) { const locales = ["ar_SA", "de_DE", "en_CA", "en_GB", "en_US" /* ... */];
// Add Rust const rustCode = emitRustResources(locales, program); await program.host.writeFile("sdks/rust/src/resources.rs", rustCode);
// ... existing Go, Java, Python, TypeScript}Example Output (Rust, truncated):
// sdks/rust/src/resources.rs (auto-generated, ~3000 lines)use std::collections::HashMap;use once_cell::sync::Lazy;use crate::models::Resources;
pub static AVAILABLE_LOCALES: &[&str] = &[ "ar_SA", "de_DE", "en_CA", "en_GB", "en_US", "es_MX", "fr_CA", "fr_FR", "hu_HU", "ja_JP", "pt_BR", "tr_TR", "vi_VN", "zh_CN",];
pub static RESOURCES: Lazy<HashMap<&'static str, Resources>> = Lazy::new(|| { let mut m = HashMap::new();
m.insert("en_US", Resources { email_domains: vec!["gmail.com".to_string(), "yahoo.com".to_string(), /* ... */], male_given_names: vec!["James".to_string(), "Robert".to_string(), /* ... */], female_given_names: vec!["Mary".to_string(), "Patricia".to_string(), /* ... */], // ... all other fields });
m.insert("ar_SA", Resources { email_domains: vec!["gmail.com".to_string(), /* ... */], male_given_names: vec!["محمد".to_string(), "أحمد".to_string(), /* ... */], // ... UTF-8 encoded Arabic names });
// ... all 14 other locales
m});Important:
- UTF-8 encoding: Properly escape strings (Arabic, Chinese, etc.)
- String ownership:
.to_string()for Rust ownership - Lazy initialization: Use
once_cell::Lazyfor static initialization - Large file: ~3000 lines, 200KB+, generated from ~250 resource files
Resource File Resolution:
// resource-emitter.js logicfunction getResourceFile(locale, country, language, filename) { if (exists(`resources/${locale}/${filename}`)) return read(locale);
// Try language-level: ar/nouns.txt if (exists(`resources/${language}/${filename}`)) return read(language);
// Try country-level: SA/timezones.txt if (exists(`resources/${country}/${filename}`)) return read(country);
// Try global: global/email_domains.txt if (exists(`resources/global/${filename}`)) return read(global);
// Fallback: empty or error return [];}3.2 Interface Emitter
Section titled “3.2 Interface Emitter”Purpose: Generate Primitives trait/interface from TypeSpec definition
Called by: array-emitter.js (automatically)
Input: primitives.tsp interface
interface Primitives { id(): string; gender(): string; locale(): string; element(items: string[]): string; digit(length: int32): string; probability(p: float32): boolean; // ... ~75 methods total}Your Task: Add Rust type mapping and code generation
Add to interface-emitter.js:
// 1. Add Rust type mappingfunction mapRustType(tspType) { if (tspType.endsWith("[]")) { const elementType = tspType.slice(0, -2); return `Vec<${mapRustType(elementType)}>`; } const typeMap = { string: "String", int32: "i32", int64: "i64", uint32: "u32", uint64: "u64", float32: "f32", float64: "f64", boolean: "bool", numeric: "f64", }; return typeMap[tspType] || "String";}
// 2. Add Rust interface generationfunction emitRustInterface(interfaceName, methods) { const methodDecls = methods .map((m) => { const params = m.parameters.map((p) => `${snakeCase(p.name)}: ${mapRustType(p.type)}`).join(", "); const returnType = mapRustType(m.returnType); return ` fn ${snakeCase(m.name)}(&self${params ? ", " + params : ""}) -> ${returnType};`; }) .join("\n");
return `pub trait ${interfaceName} {\n${methodDecls}\n}`;}
// 3. Register Rust in emitAllInterfaces()async function emitAllInterfaces(program, interfaces) { // ... existing Go, Java, Python, TypeScript
// Add Rust const rustCode = interfaces.map((i) => emitRustInterface(i.name, i.methods)).join("\n\n");
await program.host.writeFile("sdks/rust/src/primitives.rs", rustCode);}Example Output (Rust):
// sdks/rust/src/primitives.rs (auto-generated)pub trait Primitives { fn id(&self) -> String; fn uuid(&self) -> String; fn gender(&self) -> String; fn locale(&self) -> String; fn element(&self, items: Vec<String>) -> String; fn digit(&self, length: i32) -> String; fn letter(&self, length: i32) -> String; fn alnum(&self, length: i32) -> String; fn probability(&self, p: f32) -> bool; fn next_int(&self) -> i64; fn intn(&self, n: i32) -> i32; fn next_float(&self) -> f32; fn next_boolean(&self) -> bool; fn int_range(&self, min: i32, max: i32) -> i32; fn float_range(&self, min: f32, max: f32) -> f32; // ... ~20 more methods}Type Mapping Table:
| TypeSpec | Go | Java | Python | TypeScript | Rust (example) |
|---|---|---|---|---|---|
string | string | String | str | string | String |
string[] | []string | String[] | list[str] | string[] | Vec<String> |
int32 | int64 | int | int | number | i32 |
int64 | int64 | long | int | number | i64 |
float32 | float32 | float | float | number | f32 |
boolean | bool | boolean | bool | boolean | bool |
Testing:
cd typespecnpm run generate# Check: sdks/rust/src/primitives.rs should be created3.3 Array Emitter
Section titled “3.3 Array Emitter”Purpose: Generate array classes and generator functions from @array decorators
This emitter parses @generator decorators and generates field assignment code.
Input: TypeSpec model with decorators
// pseudata.tsp@array(101)model User { @generator("id") id: string; @generator("id") sub: string; @generator("compositeUserName") name: string; @generator("genderedGivenName") given_name: string; @generator("familyName") family_name: string; @generator("middleName") middle_name: string; @generator("nickname") nickname: string; @generator("preferredUsername") preferred_username: string; @generator("email") email: string; @generator("gender") gender: string; @generator("locale") locale: string; @generator("avatarUrl") picture: string; @generator("profileUrl") profile: string; @generator("websiteUrl") website: string; @generator("probability", 0.85) email_verified: boolean; @generator("birthdateStr", 18, 65, Default.RefDate) birthdate: string; @generator("zoneinfo") zoneinfo: string; @generator("phoneNumber") phone_number: string; @generator("probability", 0.75) phone_number_verified: boolean; @generator("dateTimePast", Default.RefDate, Default.Days) updated_at: safeint;}Your Task: Add Rust array generation
Add to array-emitter.js:
// 1. Add Rust argument formattingfunction formatRustArg(arg, paramType) { if (typeof arg === "string") { return `"${arg.replace(/\\/g, "\\\\").replace(/"/g, '\\"')}"`; } else if (typeof arg === "boolean") { return arg ? "true" : "false"; } else if (typeof arg === "number") { // Rust: f32 suffix for floats return Number.isInteger(arg) ? arg.toString() : `${arg}_f32`; } return String(arg);}
// 2. Add Rust array class generationfunction emitRustArrays(models, program) { let rustCode = "use crate::{PseudoArray, PrimitivesImpl, models::*};\n\n";
for (const model of models) { const typeSeq = model.decorators.array; const className = `${model.name}Array`;
// Array struct rustCode += `pub struct ${className} {\n`; rustCode += ` inner: PseudoArray<${model.name}>,\n`; rustCode += `}\n\n`;
// Constructor rustCode += `impl ${className} {\n`; rustCode += ` pub fn new(seed: u64) -> Self {\n`; rustCode += ` Self {\n`; rustCode += ` inner: PseudoArray::new(seed, ${typeSeq}, generate_${snakeCase(model.name)}),\n`; rustCode += ` }\n`; rustCode += ` }\n\n`; rustCode += ` pub fn at(&self, index: i32) -> ${model.name} {\n`; rustCode += ` self.inner.at(index)\n`; rustCode += ` }\n`; rustCode += `}\n\n`;
// Generator function rustCode += `fn generate_${snakeCase(model.name)}(world_seed: u64, type_seq: u64, index: i32) -> ${model.name} {\n`; rustCode += ` let p = PrimitivesImpl::new(world_seed, type_seq, index);\n`; rustCode += ` ${model.name} {\n`;
for (const field of model.fields) { const gen = field.decorators.generator; if (!gen) continue;
const methodName = snakeCase(gen.methodName); const args = gen.args.map((a) => formatRustArg(a)).join(", "); const call = args ? `p.${methodName}(${args})` : `p.${methodName}()`;
rustCode += ` ${snakeCase(field.name)}: ${call},\n`; }
rustCode += ` }\n`; rustCode += `}\n\n`; }
return rustCode;}
// 3. Register Rust in main emit functionexport async function $onEmit(context) { // ... parse models with @array decorator
// Add Rust generation const rustCode = emitRustArrays(arrayModels, program); await program.host.writeFile("sdks/rust/src/arrays.rs", rustCode);
// ... existing Go, Java, Python, TypeScript}Example Output (Rust):
// sdks/rust/src/arrays.rs (auto-generated)use crate::{PseudoArray, PrimitivesImpl, models::*};
pub struct UserArray { inner: PseudoArray<User>,}
impl UserArray { pub fn new(seed: u64) -> Self { Self { inner: PseudoArray::new(seed, 101, generate_user), } }
pub fn at(&self, index: i32) -> User { self.inner.at(index) }}
fn generate_user(world_seed: u64, type_seq: u64, index: i32) -> User { let p = PrimitivesImpl::new(world_seed, type_seq, index); User { id: p.id(), sub: p.id(), name: p.composite_user_name(), given_name: p.gendered_given_name(), family_name: p.family_name(), middle_name: p.middle_name(), nickname: p.nickname(), preferred_username: p.preferred_username(), email: p.email(), gender: p.gender(), locale: p.locale(), picture: p.avatar_url(), profile: p.profile_url(), website: p.website_url(), email_verified: p.probability(0.85_f32), birthdate: p.birthdate_str(18, 65, 1735689600), zoneinfo: p.zoneinfo(), phone_number: p.phone_number(), phone_number_verified: p.probability(0.75_f32), updated_at: p.date_time_past(1735689600, 30), }}
pub struct AddressArray { inner: PseudoArray<Address>,}
impl AddressArray { pub fn new(seed: u64) -> Self { Self { inner: PseudoArray::new(seed, 110, generate_address), } }
pub fn at(&self, index: i32) -> Address { self.inner.at(index) }}
fn generate_address(world_seed: u64, type_seq: u64, index: i32) -> Address { let p = PrimitivesImpl::new(world_seed, type_seq, index); Address { id: p.id(), formatted: p.formatted_address(), street_address: p.street_address(), locality: p.locality(), region: p.region(), postal_code: p.postal_code(), country: p.country(), locale: p.locale(), }}Key Points:
- Float literals: Rust needs
_f32suffix (Java needsf, Python/TS don’t) - Snake case: Convert TypeSpec camelCase to Rust snake_case
- Type mapping: Use your
mapRustType()function - Error handling: Validate decorator arguments
Testing Your Emitters
Section titled “Testing Your Emitters”After implementing all three emitters:
# 1. Generate codecd typespecnpm run generate
# 2. Check generated files existls -lh ../sdks/rust/src/# Should see: primitives.rs, arrays.rs, resources.rs
# 3. Verify compilation (Rust example)cd ../sdks/rustcargo check# Should compile without errors (models.rs from quicktype required too)Common Issues:
- Missing type mapping: Add to
mapRustType() - Float literal format: Check
formatRustArg() - String escaping: Handle quotes and backslashes
- UTF-8: Test with Arabic/Chinese resource files
Step 4: PseudoArray (PseudoArray Base Class)
Section titled “Step 4: PseudoArray (PseudoArray Base Class)”What: Generic PseudoArray that generates elements on-demand via a generator function.
Reference:
- Go:
pseudo_array.go - TypeScript:
sdks/typescript/src/pseudo-array.ts
Core Concept: random access without storing elements.
Example Implementation (Rust):
pub struct PseudoArray<T> { world_seed: u64, type_seq: u64, generator_fn: fn(u64, u64, i32) -> T,}
impl<T> PseudoArray<T> { pub fn new(world_seed: u64, type_seq: u64, generator_fn: fn(u64, u64, i32) -> T) -> Self { Self { world_seed, type_seq, generator_fn, } }
pub fn at(&self, index: i32) -> T { (self.generator_fn)(self.world_seed, self.type_seq, index) }}Used by generated arrays:
// Generated by array-emitter.jspub struct UserArray { inner: PseudoArray<User>,}
impl UserArray { pub fn new(seed: u64) -> Self { Self { inner: PseudoArray::new(seed, 101, generate_user), } }
pub fn at(&self, index: i32) -> User { self.inner.at(index) }}No tests needed - tested via array fixture tests.
Step 5: ID Utils (PseudoID Encoding/Decoding)
Section titled “Step 5: ID Utils (PseudoID Encoding/Decoding)”What: Encode/decode deterministic UUIDs (UUID v8) with embedded metadata.
Reference:
- Go:
id_utils.go - Documentation: PseudoID
Bit Layout (128-bit UUID v8):
Bits 63..........................0 15..............0 1 0 15..........0 43..................0 worldSeed[63:0] typeSeq[15:0] skip index[43:0]UUID: SSSSSSSS-SSSS-8SSS-vSNN-TTTT-IIIIIIIIIIII
v = version bits (1000 = v8)skip = 2 reserved bits (must be 0)Detailed Format:
UUID v8 Format:SSSSSSSS-SSSS-8SSS-vvTT-IIIIIIIIII
S = worldSeed (64 bits)8 = version nibble (0x8)v = variant bits (0b10)-- = skip bits (2 bits, must be 0)T = typeSeq (16 bits)I = index (40 bits)
Visual pattern:worldSeed[63:32] worldSeed[31:16] 8 + worldSeed[15:3] variant + skip + typeSeq[15:14] typeSeq[13:0] + index[39:32] index[31:0]Functions to implement:
fn encode_id(world_seed: u64, type_seq: u16, index: u64) -> String;fn decode_id(uuid: &str) -> Option<(u64, u16, u64)>; // (worldSeed, typeSeq, index)Validation: Decode must fail if skip bits != 0 (forward compatibility).
Test: fixtures/id_test_vectors.json
- 13 test vectors covering edge cases, bit boundaries, validation
- Test vector #13 has
should_decode: false(invalid skip bits)
Step 6: Primitives Implementation
Section titled “Step 6: Primitives Implementation”What: Data generation methods using Generator and Resources.
Files:
- Interface (generated):
primitives.rsfrom TypeSpec - Implementation:
primitives_impl.rs(hand-written) - Resources:
resources.rs(generated fromtypespec/resources/)
Reference:
- Go:
primitives_impl.go - TypeSpec:
typespec/src/primitives.tsp
Structure:
struct PrimitivesImpl { world_seed: u64, type_seq: u64, index: i32,}
impl Primitives for PrimitivesImpl { fn rng(&self) -> Generator { Generator::new(self.world_seed, self.type_seq).advance(self.index) }
fn id(&self) -> String { encode_id(self.world_seed, self.type_seq as u16, self.index as u64) }
fn locale(&self) -> String { let rng = self.rng(); AVAILABLE_LOCALES[rng.intn(AVAILABLE_LOCALES.len())] }
// ... ~75 more primitive methods}Critical Pattern:
- Store
rng()in local variable if used more than once in a function - Each
rng()call creates a new generator at the same position - Multiple calls without storing = same value repeated!
// ❌ WRONG - repeats same characterfn digit(&self, length: i32) -> String { (0..length).map(|_| self.rng().intn(10).to_string()).collect()}
// ✅ Correct - advances through sequencefn digit(&self, length: i32) -> String { let mut rng = self.rng(); // Store once (0..length).map(|_| rng.intn(10).to_string()).collect()}Test: fixtures/primitives_test_vectors.json
- 12 test cases covering all ~75 primitive methods
- Generated with Go:
go test -run TestPrimitivesWithVectors -update
Resource Access:
- Resources are locale-dependent (e.g.,
RESOURCES["en_US"]) - Field naming: TypeSpec uses
camelCase, adjust for language conventions - Python uses dict access:
resources["maleGivenNames"] - Java/TS use property access:
resources.maleGivenNames
Step 7: Models (Auto-generated via quicktype)
Section titled “Step 7: Models (Auto-generated via quicktype)”What: Data structures generated from TypeSpec models via quicktype.
Reference:
- TypeSpec:
typespec/src/pseudata.tsp - Generation:
typespec/package.jsonscripts
Generation Flow:
# 1. TypeSpec compiles to JSON Schematsp compile src# Output: tsp-output/@typespec/json-schema/User.json, Address.json
# 2. Quicktype generates models from JSON Schemaquicktype --src-lang schema \ tsp-output/@typespec/json-schema/*.json \ -o ../sdks/rust/src/models.rs \ --lang rust \ --visibility public \ --derive-debugQuicktype Options by Language:
# Goquicktype --lang go --package pseudata --just-types-and-package
# Javaquicktype --lang java --package dev.pseudata --just-types
# Pythonquicktype --lang python --just-types
# TypeScriptquicktype --lang typescript --just-types
# Rust (example)quicktype --lang rust --visibility public --derive-debugWhat quicktype generates:
- Data structures (User, Address, Resources)
- JSON serialization/deserialization
- Type annotations
What quicktype does NOT generate:
- Primitives interface (custom emitter)
- Array implementations (custom emitter)
- Resource data (custom emitter)
- Generator functions (custom emitter)
Example Output:
#[derive(Serialize, Deserialize, Debug)]pub struct User { pub id: String, pub sub: String, pub name: String, pub given_name: String, pub family_name: String, pub middle_name: String, pub nickname: String, pub preferred_username: String, pub email: String, pub gender: String, pub locale: String, pub picture: String, pub profile: String, pub website: String, pub email_verified: bool, pub birthdate: String, pub zoneinfo: String, pub phone_number: String, pub phone_number_verified: bool, pub updated_at: i64,}Step 8: Array Fixture Tests
Section titled “Step 8: Array Fixture Tests”What: End-to-end verification using fixture-based testing.
Test:
fixtures/array_user_test_vectors.jsonfixtures/array_address_test_vectors.json
TypeSeq Values:
101- UserArray110- AddressArray- Custom arrays start from
1024
Array Test Pattern:
// Generated from TypeSpec @array decoratorstruct UserArray { inner: PseudoArray<User>,}
impl UserArray { fn new(seed: u64) -> Self { Self { inner: PseudoArray::new(seed, 101, generate_user), } }
fn at(&self, index: i32) -> User { self.inner.at(index) }}
fn generate_user(world_seed: u64, type_seq: u64, index: i32) -> User { let p = PrimitivesImpl::new(world_seed, type_seq, index); User { id: p.id(), sub: p.id(), name: p.composite_user_name(), given_name: p.gendered_given_name(), // ... populate all 20 fields }}Testing Strategy
Section titled “Testing Strategy”Why Fixture-Based Testing?
Section titled “Why Fixture-Based Testing?”Pseudata’s core value proposition is deterministic cross-language consistency: the same seed must produce identical data in every language. Fixture-based testing is the only way to guarantee this.
Without fixtures, you could have:
- Go generating
"John Smith"for seed 42 - Java generating
"John Smyth"due to a resource loading bug - Each language passing its own tests, but cross-language compatibility broken
Fixtures provide a single source of truth that all languages must match exactly. This catches:
- RNG state bugs (calling
rng()multiple times incorrectly) - UTF-8 handling issues (byte vs. rune slicing)
- Type mapping errors (signed vs. unsigned integers)
- Resource access bugs (wrong locale fallback)
- Formatting inconsistencies (date/phone number patterns)
Golden Test Pattern
Section titled “Golden Test Pattern”All tests follow the same pattern:
- Go generates fixtures with
-updateflag - All languages read and validate against fixtures
- Fixtures are committed to the repository
Test Execution:
# Generate fixtures (from Go)cd /path/to/pseudatago test -run TestGeneratorWithVectors -update # pcg32_test_vectors.jsongo test -run TestSeedFromVectors -update # seed_test_vectors.jsongo test -run TestIDUtilsWithVectors -update # id_test_vectors.jsongo test -run TestPrimitivesWithVectors -update # primitives_test_vectors.jsongo test -run TestUserArrayWithVectors -update # array_user_test_vectors.jsongo test -run TestAddressArrayWithVectors -update # array_address_test_vectors.json
# Run tests (Rust example)cargo testTest File Template
Section titled “Test File Template”use serde::{Deserialize, Serialize};use std::fs;
#[derive(Deserialize)]struct TestCase { id: i32, description: String, inputs: TestInputs, expected: TestExpected,}
#[test]fn test_component_with_vectors() { let fixture = fs::read_to_string("../../fixtures/component_test_vectors.json") .expect("Run: cd ../../ && go test -update");
let test_cases: Vec<TestCase> = serde_json::from_str(&fixture).unwrap();
for tc in test_cases { // ... run test assert_eq!(actual, tc.expected.value, "{} failed", tc.description); }}Package Management
Section titled “Package Management”Create package.json / Cargo.toml / setup.py:
# Cargo.toml example[package]name = "pseudata"version = "0.0.1"edition = "2021"
[dependencies]serde = { version = "1.0", features = ["derive"] }serde_json = "1.0"uuid = "1.0"
[dev-dependencies]# Test dependenciesDocumentation
Section titled “Documentation”Create README.md in SDK directory:
# Pseudata - Rust SDK
Deterministic mock data generation for testing.
## Installation
```toml[dependencies]pseudata = "0.0.1"```
## Quick Start
```rustuse pseudata::{Generator, UserArray, seed_from};
// Generate deterministic userslet seed = seed_from("test-scenario-1");let users = UserArray::new(seed);
let user = users.at(0);println!("Email: {}", user.email);println!("Name: {}", user.name);```
## Running Tests
```bashcargo test```Step 7: CI/CD Integration
Section titled “Step 7: CI/CD Integration”Once your SDK is implemented and tested locally, integrate it into the automated CI/CD pipeline.
7.1: Create Taskfile
Section titled “7.1: Create Taskfile”Add a Taskfile.yml in your SDK directory (sdks/your-language/Taskfile.yml):
version: "3"
tasks: setup: desc: Install dependencies for your language cmds: - # your dependency installation command # Example: cargo fetch (Rust) # Example: dotnet restore (C#)
test: desc: Run tests cmds: - # your test command # Example: cargo test # Example: dotnet test
lint: desc: Lint code cmds: - # your linter command # Example: cargo clippy # Example: dotnet format --verify-no-changes
format: desc: Format code cmds: - # your formatter command # Example: cargo fmt # Example: dotnet format
clean: desc: Clean build artifacts cmds: - # your clean command # Example: cargo clean # Example: dotnet clean7.2: Update Root Taskfile
Section titled “7.2: Update Root Taskfile”Add your SDK to the root Taskfile.yml:
includes: # ... existing includes ... rust: # your language name taskfile: ./sdks/rust/Taskfile.yml dir: ./sdks/rust
tasks: # Update aggregate commands to include your SDK: setup: desc: Install dependencies for ALL languages deps: [go:setup, python:setup, typescript:setup, java:setup, typespec:setup, rust:setup]
test: desc: Run tests for ALL languages deps: [go:test, python:test, typescript:test, java:test, rust:test]
lint: desc: Run linters for ALL languages deps: [go:lint, python:lint, typescript:lint, java:lint, rust:lint]
format: desc: Format ALL code deps: [go:format, python:format, typescript:format, java:format, typespec:format, rust:format]
clean: desc: Clean ALL artifacts deps: [go:clean, python:clean, typescript:clean, java:clean, rust:clean]7.3: Update Setup Environment Action
Section titled “7.3: Update Setup Environment Action”The setup-env composite action (.github/actions/setup-env/action.yml) centralizes all toolchain setup. Add your language here:
# Add input parameter (after existing setup-node)inputs: setup-rust: description: "Setup Rust toolchain (only used when all=false)" required: false default: "false"
# Add setup stepruns: using: "composite" steps: # ... existing steps ...
- name: Setup Rust if: inputs.all == 'true' || inputs.setup-rust == 'true' uses: actions-rust-lang/setup-rust-toolchain@1fbea72663f6d4c03efaab13560c8a24cfd2a7cc # v1.0.0 shell: bash with: toolchain: stable cache: true
# For C#: - name: Setup .NET if: inputs.all == 'true' || inputs.setup-dotnet == 'true' uses: actions/setup-dotnet@6bd8b7f7774af54e05809fcc5431931b3eb1ddee # v4.0.1 shell: bash with: dotnet-version: "9.0" cache: true cache-dependency-path: "sdks/csharp/YourProject.csproj"Why Setup-Env Action?
- Single source of truth: Language versions defined once
- Consistency: Same setup across CI, validation, and release workflows
- Maintainability: Update Go version once, applies everywhere
- Automatic caching: Dependency caching configured per language
Security: All actions MUST be pinned to commit SHAs (not tags):
# ❌ Wrong (tag-based, mutable)uses: actions-rust-lang/setup-rust-toolchain@v1
# ✅ Correct (SHA-pinned, immutable)uses: actions-rust-lang/setup-rust-toolchain@1fbea72663f6d4c03efaab13560c8a24cfd2a7cc # v1.0.07.4: Update Reusable CI Workflow
Section titled “7.4: Update Reusable CI Workflow”Update .github/workflows/reusable-ci.yml to pass your language flag to setup-env:
steps: - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- name: Setup Development Environment uses: ./.github/actions/setup-env with: all: "false" setup-go: ${{ inputs.sdk == 'go' }} setup-python: ${{ inputs.sdk == 'python' }} setup-java: ${{ inputs.sdk == 'java' }} setup-node: ${{ inputs.sdk == 'typescript' || inputs.sdk == 'typespec' }} setup-rust: ${{ inputs.sdk == 'rust' }} # Add your language7.5: Create CI Workflow
Section titled “7.5: Create CI Workflow”Create .github/workflows/ci-rust.yml (or your language):
# CI (Rust) Workflow## Triggers the shared CI workflow for the Rust SDK.# This workflow ensures that the Rust codebase (located in sdks/rust)# passes all standard checks (setup, lint, test) across supported operating systems.## Matrix Optimization & Triggers:# - on: pull_request: Runs only on [ubuntu-latest] for fast feedback loops.# - on: push: Expands matrix to [ubuntu-latest, windows-latest, macos-latest]# to ensure cross-platform compatibility after merging.
name: CI (Rust)on: push: branches: ["main"] paths: ["sdks/rust/**", "Taskfile.yml", ".github/**"] pull_request: branches: ["main"] paths: ["sdks/rust/**", "Taskfile.yml", ".github/**"]
jobs: check: strategy: fail-fast: false matrix: os: [ubuntu-latest] include: - os: ${{ github.event_name == 'push' && 'windows-latest' || '' }} - os: ${{ github.event_name == 'push' && 'macos-latest' || '' }} exclude: - os: ""
uses: ./.github/workflows/reusable-ci.yml with: sdk: rust path: sdks/rust os: ${{ matrix.os }}Key Points:
- Name Pattern: Must be
CI (YourLanguage)- the aggregator discovers workflows matching^CI\\(.+\\) - Path Filters: Include your SDK directory, root Taskfile, and
.github/**(covers all workflow/action changes) - Matrix Strategy: Ubuntu-only on PRs, full matrix on push to main
- fail-fast: false: Tests all platforms even if one fails
7.6: Update Release Workflow
Section titled “7.6: Update Release Workflow”Add publishing job to .github/workflows/release.yml:
# Add after existing publish jobs:
# ============================================================================# JOB: PUBLISH RUST (crates.io)# Runs ONLY after validation passes.# ============================================================================publish-rust: needs: [release-please, validate] # Must wait for validation! if: ${{ needs.release-please.outputs.releases_created }} runs-on: ubuntu-latest steps: - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- name: Setup Development Environment uses: ./.github/actions/setup-env with: all: "false" setup-rust: "true"
- name: Publish env: CARGO_REGISTRY_TOKEN: ${{ secrets.CARGO_TOKEN }} run: task rust:publishKey Changes from Old Pattern:
- Uses setup-env action instead of manual Task + toolchain setup
- Depends on
validatejob - ensures quality gate passes first - Consistent with other publish jobs - same pattern across all languages
For Other Languages:
# C# / NuGetpublish-csharp: needs: [release-please, validate] if: ${{ needs.release-please.outputs.releases_created }} runs-on: ubuntu-latest steps: - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- name: Setup Development Environment uses: ./.github/actions/setup-env with: all: "false" setup-dotnet: "true"
- name: Publish env: NUGET_API_KEY: ${{ secrets.NUGET_API_KEY }} run: task csharp:publish7.7: Configure Repository Secrets
Section titled “7.7: Configure Repository Secrets”Add required secrets in GitHub repository settings (Settings → Secrets and variables → Actions):
Common Secrets (already configured):
PYPI_TOKEN- Python package publishingNPM_TOKEN- TypeScript package publishingMAVEN_CENTRAL_USERNAME,MAVEN_CENTRAL_TOKEN- Java Maven Central (via Central Publishing Maven Plugin)MAVEN_GPG_PRIVATE_KEY,MAVEN_GPG_PASSPHRASE- Java artifact signing
Add for Your Language:
- Rust:
CARGO_TOKEN(from crates.io) - C#:
NUGET_API_KEY(from nuget.org) - Ruby:
RUBYGEMS_API_KEY(from rubygems.org) - PHP: No secret needed (Packagist auto-syncs from Git tags)
7.8: Test CI Workflow
Section titled “7.8: Test CI Workflow”-
Push to feature branch:
Terminal window git checkout -b test/add-rust-sdkgit add sdks/rust/ .github/git commit -m "feat: add Rust SDK with CI"git push origin test/add-rust-sdk -
Create Pull Request:
- CI should trigger automatically
- Check that
CI (Rust)workflow runs - Verify tests pass on ubuntu-latest
- Check that
CIaggregator discovers and includes your workflow
-
After PR approval, merge to main:
- Full matrix (ubuntu, windows, macos) runs
- Verify all platforms pass
-
Test validation workflow (optional):
Terminal window # Manually trigger validation for your branchgh workflow run validate.yml --ref test/add-rust-sdk -
Test release workflow (after merge):
- Wait for release-please to create release PR
- Merge release PR
- Verify validation runs and passes
- Check publishing job runs correctly
- Verify package appears in registry
7.9: Documentation Updates
Section titled “7.9: Documentation Updates”Update these files to include your language:
Required:
- Main
README.md- Add SDK to language list sdks/README.md- Add SDK documentation link- Root
Taskfile.yml- Add to aggregate commands (already done in 7.2) .github/actions/setup-env/action.yml- Add toolchain setup (already done in 7.3)
If applicable:
typespec/docs/decorators/resources.md- If new resource types addedtypespec/src/resources.tsp- If new resources defined- Cross-language compatibility matrix
Integration Checklist
Section titled “Integration Checklist”Core Implementation
Section titled “Core Implementation”- Generator: PCG32 implementation with fixture tests
- SeedFrom: FNV-1a hash with fixture tests
- ID Utils: Encode/decode with fixture tests
- PseudoArray: Generic PseudoArray base class
- Primitives: Interface + implementation + fixture tests
- Models: User, Address data structures (auto-generated via quicktype)
- Arrays: UserArray + AddressArray + fixture tests
Code Generation
Section titled “Code Generation”- Resource Emitter: Embed locale data from
typespec/resources/ - Interface Emitter: Generate Primitives interface
- Array Emitter: Generate array classes + generator functions
CI/CD Integration
Section titled “CI/CD Integration”- Taskfile: Created
sdks/your-language/Taskfile.ymlwith setup/test/lint/format/clean tasks - Root Taskfile: Updated to include your language in aggregate commands
- Setup-Env Action: Added toolchain setup to
.github/actions/setup-env/action.yml - Reusable CI: Updated to pass language flag to setup-env action
- CI Workflow: Created
.github/workflows/ci-yourlanguage.ymlwith matrix strategy - Release Workflow: Added publish job to
.github/workflows/release.yml(depends on validate) - Secrets: Configured package registry API tokens in GitHub settings
- CI Testing: Verified workflow runs on draft PR and passes on all platforms
- Validation: Manually triggered
validate.ymlworkflow for your branch
Package & Documentation
Section titled “Package & Documentation”- Package: Published to language package registry
- SDK README: Installation and usage examples
- Main README: Updated language listing
- sdks/README: Updated SDK table
- Website Docs: Updated language support pages
Common Pitfalls
Section titled “Common Pitfalls”1. RNG State Management
Section titled “1. RNG State Management”❌ Wrong: Multiple rng() calls repeat values
fn bad(&self) -> String { format!("{}{}", self.rng().intn(10), self.rng().intn(10)) // Same digit twice!}✅ Correct: Store rng in local variable
fn good(&self) -> String { let mut rng = self.rng(); format!("{}{}", rng.intn(10), rng.intn(10)) // Different digits}2. UTF-8 Handling
Section titled “2. UTF-8 Handling”Ensure byte-level operations for seed_from, rune/character operations for text primitives.
3. Unsigned Integer Handling
Section titled “3. Unsigned Integer Handling”JavaScript/TypeScript need json-bigint for 64-bit integers.
4. Float Literal Suffixes
Section titled “4. Float Literal Suffixes”Java requires f suffix on float literals in generated code.
5. Resource Field Naming
Section titled “5. Resource Field Naming”Match TypeSpec camelCase: maleGivenNames not male_given_names in runtime code.
Getting Help
Section titled “Getting Help”- Reference Implementations: Check Go, Java, Python, TypeScript in
sdks/ - Test Vectors: All fixtures in
fixtures/directory - TypeSpec: Definitions in
typespec/src/ - GitHub Issues: Report problems or ask questions
Contributing Your Language
Section titled “Contributing Your Language”Once implemented and tested:
- Create Pull Request with SDK in
sdks/your-language/ - Include CI/CD Integration:
- Taskfile with all standard tasks
- CI workflow file
- Release workflow updates
- Updated root Taskfile
- Update Documentation:
- Main README.md with language listing
- sdks/README.md with SDK table entry
- Cross-language compatibility matrix
- Website language support pages
- Test Thoroughly:
- All fixture tests pass locally
- CI passes on all platforms (ubuntu, windows, macos)
- Release workflow tested on staging branch
- Provide Examples:
- SDK README with installation instructions
- Usage examples for all major features
- Package registry publication proof
Review Process:
- Maintainers will verify cross-language consistency
- Check CI/CD integration completeness
- Review code quality and documentation
- Test package installation from registry
Last Updated: 2025-01-07
© 2025 Pseudata Project. Open Source under Apache License 2.0. · RSS Feed