Contributing to Pseudata

Thank you for your interest in contributing to Pseudata! This guide will help you understand the different ways you can contribute and point you to the right resources.

What is Pseudata?

Pseudata is a deterministic mock data generation library that produces identical data across all programming languages for the same seed. This cross-language consistency is achieved through:

PCG32 algorithm: Deterministic pseudo-random number generation
TypeSpec definitions: Single source of truth for data structures
Fixture-based testing: Ensures outputs match exactly across languages
Code generation: Automated SDK creation from TypeSpec

Pseudata initially launched with support for Go, Java, Python, and TypeScript. The roadmap includes expanding to nine programming languages (C#, PHP, Rust, Swift, and Dart), with community contributions welcome for additional language implementations.

Ways to Contribute

Choose the contribution type that matches what you want to add:

1. Add a New Model

What: Define new data structures (like Person, Product, Order) with their corresponding PseudoArrays.

When to use:

You need a new type of test data (e.g., Invoice, Event)
You want to generate arrays of complex objects
Your use case requires domain-specific data structures

Key skills: TypeSpec, understanding of code generation

→ Read the Add a New Model guide

2. Add a New Primitive

What: Implement fundamental data generation methods (like email(), hexColor(), phoneNumber()).

When to use:

You need a new type of random value
The primitive will be reused across multiple models
You want to extend Pseudata’s base functionality

Key skills: Go, Java, Python, TypeScript, understanding of deterministic generation

Important: Must implement in all supported languages to maintain consistency.

→ Read the Add a New Primitive guide

3. Add a New Resource

What: Add locale-specific data files (names, cities, words) for realistic test data.

When to use:

You need vocabulary for text generation (nouns, adjectives)
You’re implementing primitives that need locale-specific lists

Key skills: Data curation, optional primitive implementation

→ Read the Add a New Resource guide

4. Add a New Locale

What: Add support for a new country or region (e.g., adding ja_JP for Japan).

When to use:

You need culturally appropriate names and addresses for a new country
You want to expand Pseudata’s international coverage

Key skills: Linguistic knowledge, data collection

→ Read the Add a New Locale guide

5. Add a New SDK

What: Implement full Pseudata support for a new programming language (e.g., Rust, C#, PHP, Ruby).

When to use:

Your team uses a language not yet supported
You want to enable Pseudata for a major ecosystem
You’re comfortable with language internals and tooling

Key skills: Deep knowledge of target language, TypeScript (for emitters), understanding of all Pseudata components

Important: This is a significant undertaking requiring implementation of Generator, emitters, primitives, and comprehensive testing.

→ Read the Add a New SDK guide

The Foundation: Fixture-Based Testing

All contributions must include fixture-based tests. This is not optional—it’s the foundation of Pseudata’s cross-language consistency guarantee.

Why Fixtures Matter

Without fixtures, you could have:

Go generating "John Smith" for seed 42
Java generating "John Smyth" due to a subtle bug
Each language passing its own tests, but cross-language compatibility broken

Fixtures provide a single source of truth: pre-generated test vectors that all languages must match exactly.

How It Works

Go as Reference: Implement in Go first, run go test -update to generate fixture JSON
Cross-Language Verification: Other languages load the same fixtures and verify outputs match
Continuous Validation: Any code change that breaks consistency fails tests immediately

What Fixtures Catch

RNG state bugs (calling rng() multiple times incorrectly)
UTF-8 handling issues (byte vs. rune slicing)
Type mapping errors (signed vs. unsigned integers)
Resource access bugs (wrong locale, missing data)
Formatting inconsistencies (date/time/phone patterns)

→ Read the detailed Testing Guide

Quick Start

Fork the repository: github.com/pseudata/pseudata
Choose your contribution type: Follow one of the four guides above
Implement your changes: Follow the step-by-step instructions
Add fixture tests: Ensure cross-language consistency
Run all tests: Verify nothing broke
Submit a pull request: Include description and test results

Code Generation Architecture

Understanding the code generation flow helps you work effectively with Pseudata:

Code Generation Flow

Pseudata uses TypeSpec as the source of truth and generates code through multiple paths:

Models: TypeSpec → JSON Schema → quicktype → language-specific classes
Primitives: TypeSpec → interface-emitter.js → language-specific interfaces
Arrays: TypeSpec → array-emitter.js → language-specific array classes
Resources: Text files → resource-emitter.js → embedded language-specific data

Development Setup

Pseudata uses Dev Containers to ensure all contributors have an identical development environment. This eliminates “works on my machine” issues and guarantees consistent builds and tests across all contributions.

Requirements

Install Docker
Install VS Code
Install the Dev Containers extension

Note: While this guide documents VS Code, any dev container-compatible tool can be used. See the full list of supporting tools for alternatives like JetBrains IDEs, Neovim, and others.

Setup

# Clone the repository
git clone https://github.com/pseudata/pseudata.git
cd pseudata

# Open in VS Code
code .

When prompted, click “Reopen in Container” (or use Command Palette: “Dev Containers: Reopen in Container”)

The dev container includes everything pre-configured:

Go, Java, Python, Node.js with correct versions
All build tools and dependencies
Task (task runner for build automation)

Working in the Container

# The monorepo structure:
pseudata/                # Main implementation
├── typespec/            # TypeSpec definitions
│   ├── src/             # TypeSpec model definitions
│   ├── lib/             # Custom emitters (interface, array, resource)
│   └── resources/       # Locale-specific data files
├── sdks/                # Language SDKs
│   ├── go/
│   ├── java/
│   ├── python/
│   └── typescript/
└── fixtures/            # Test vectors

# Run code generation
task generate

# Run tests for each language
task test

# See all available tasks
task --list

Code Standards

Code Quality

Follow language conventions: PascalCase (Go), camelCase (Java/TS), snake_case (Python)
Add documentation: Doc comments for all public APIs
Handle UTF-8 correctly: Use rune/character operations, not byte slicing
Match existing patterns: Study the codebase before implementing
EditorConfig: The repository includes .editorconfig for consistent formatting (indentation, line endings, charset). Your editor will automatically apply these settings.

Documentation Standards

Follow language-specific documentation conventions for all public APIs:

Go: Use GoDoc format

// Email generates a deterministic email address.
// The format is lowercase-username@domain.tld.
func (p *PrimitivesImpl) Email() string {

Java: Use JavaDoc format

/**
 * Generates a deterministic email address.
 * The format is lowercase-username@domain.tld.
 * @return a valid email address string
 */
public String email() {

Python: Use PEP 257 docstrings

def email(self) -> str:
    """Generate a deterministic email address.

    The format is lowercase-username@domain.tld.

    Returns:
        str: A valid email address string
    """

TypeScript: Use TSDoc format

/**
 * Generates a deterministic email address.
 * The format is lowercase-username@domain.tld.
 * @returns A valid email address string
 */
email(): string {

Testing Requirements

Fixture tests are mandatory: All changes must include fixture-based tests
Cross-language consistency: Verify outputs match across all languages
Edge cases: Test boundary conditions, empty inputs, special characters
Performance: Ensure generated data is fast ( $O(1)$ for array access)

Commit Message Convention

Pseudata follows Conventional Commits to maintain a clear and structured commit history. This helps with automated changelog generation and makes it easier to understand what each commit does.

Format

<type>(<scope>): <subject>

[optional body]

[optional footer]

Types

feat: A new feature
fix: A bug fix
docs: Documentation only changes
style: Code style changes (formatting, missing semi-colons, etc.)
refactor: Code change that neither fixes a bug nor adds a feature
perf: Performance improvement
test: Adding missing tests or correcting existing tests
chore: Changes to the build process or auxiliary tools

Scopes

Use scopes to indicate which part of the codebase is affected:

go, java, python, typescript, csharp, php, rust, swift, dart: Language-specific changes
generator: Changes to the PCG32 generator
primitives: Changes to primitive methods
arrays: Changes to array classes
resources: Changes to locale/resource data
typespec: Changes to TypeSpec definitions
emitter: Changes to code generation emitters
fixtures: Changes to test fixtures
docs: Documentation changes
website: Changes to the documentation website

Examples

# Adding a new feature
feat(primitives): add hexColor() method

# Fixing a bug
fix(java): correct UUID generation for large seeds

# Updating documentation
docs(contributing): add conventional commits section

# Refactoring code
refactor(typescript): simplify resource loading logic

# Adding tests
test(fixtures): add edge cases for digit() primitive

# Multiple scopes
feat(go,java,python,typescript): implement alnum() primitive

Breaking Changes

If your commit introduces a breaking change, add BREAKING CHANGE: in the footer:

feat(primitives): rename int() to nextInt()

BREAKING CHANGE: int() method has been renamed to nextInt() for consistency

Changelog

Pseudata maintains a changelog following the Keep a Changelog format. The changelog provides a human-readable history of notable changes for each version.

When to Update the Changelog

Update CHANGELOG.md when making user-facing changes:

feat: Add entry under “Added” for new features
fix: Add entry under “Fixed” for bug fixes
BREAKING CHANGE: Add entry under “Changed” with clear migration guidance
perf: Add entry under “Changed” for significant performance improvements
deprecation: Add entry under “Deprecated” for features being phased out

Skip changelog updates for:

Internal refactoring (refactor)
Test changes (test)
Documentation updates (docs)
Build/tooling changes (chore)

Changelog Format

# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added
- New `hexColor()` primitive for generating hex color codes
- Support for Rust language SDK

### Changed
- Improved performance of `digit()` by 40%

### Fixed
- UTF-8 handling in `nickname()` for non-ASCII characters
- UUID generation for large seed values in Java

### Deprecated
- `emailVerified()` will be removed in v2.0.0, use `probability(0.9)` instead

## [1.0.0] - 2024-01-15

### Added
- Initial release with Go, Java, Python, and TypeScript support
...

Best Practices

Be specific: Include method names, affected languages, and clear descriptions
User-focused: Write for library users, not implementation details
Link issues: Reference GitHub issues when applicable: (#123)
Breaking changes first: Always list breaking changes at the top of a release section
Group by language: For language-specific changes, group entries together

Example Entries

### Added
- `alnum()` primitive for alphanumeric strings (all languages)
- Java: Added `BigInteger` support for large world seeds

### Fixed
- Python: Fixed resource loading for Windows paths
- TypeScript: Corrected type definitions for `PseudoArray`

### Changed
- **BREAKING**: Renamed `int()` to `nextInt()` for API consistency (all languages)
  - Migration: Replace `p.int()` with `p.nextInt()`

Versioning

Pseudata follows Semantic Versioning 2.0.0 (SemVer):

Given a version number MAJOR.MINOR.PATCH:

MAJOR (e.g., 1.0.0 → 2.0.0): Incompatible API changes
- Breaking changes to public APIs
- Removal of deprecated features
- Changes to deterministic output (same seed produces different data)
MINOR (e.g., 1.0.0 → 1.1.0): Backwards-compatible new functionality
- New primitives, models, or arrays
- New locale/resource support
- New language SDK implementations
- Performance improvements
PATCH (e.g., 1.0.0 → 1.0.1): Backwards-compatible bug fixes
- Bug fixes that don’t change the API
- Documentation updates
- Internal refactoring

Important: Because Pseudata is a deterministic data generation library, changes that alter the output for a given seed are considered BREAKING CHANGES, even if the API signature remains the same.

Pre-release Versions

Alpha (1.0.0-alpha.1): Early development, unstable
Beta (1.0.0-beta.1): Feature-complete, testing phase
Release Candidate (1.0.0-rc.1): Final testing before stable release

License

Pseudata is licensed under the Apache License 2.0. By contributing, you agree that your contributions will be licensed under the same terms.

What this means for contributors:

✅ You retain copyright to your contributions
✅ You grant the project a perpetual, worldwide, non-exclusive license to use your code
✅ You confirm you have the right to submit the contribution
✅ Your contributions can be used commercially, modified, and distributed
✅ Patent grants are included (protection against patent claims)

Key requirements:

Include the Apache 2.0 license header in new files (when applicable)
Preserve existing copyright notices
Document significant changes in NOTICE file (if required)

For the full license text, see LICENSE in the repository.

If you’re contributing on behalf of your employer, ensure you have permission to contribute code under this license.

Pull Request Process

Branch naming: feature/model-invoice, fix/utf8-nickname, lang/rust
Commit messages: Follow Conventional Commits format (see above)
Update CHANGELOG.md: Add entries for user-facing changes under [Unreleased]
PR description: Explain what, why, how; include before/after examples
Tests passing: All language tests must pass
Documentation: Update relevant docs if adding new features

Getting Help

GitHub Issues: Report bugs or request features
GitHub Discussions: Ask questions, share ideas
Reference Implementations: Check existing Go/Java/Python/TypeScript code
TypeSpec Docs: typespec.io for TypeSpec questions

Code of Conduct

Pseudata is committed to providing a welcoming and inclusive environment for all contributors. Please read the Code of Conduct to understand the standards expected from everyone participating in the Pseudata community.

Recognition

Contributors are recognized in:

Repository CONTRIBUTORS file
Release notes for their contributions
GitHub contributor graph

Thank you for helping make Pseudata better! 🎉