Contributing to Pseudata
Thank you for your interest in contributing to Pseudata! This guide will help you understand the different ways you can contribute and point you to the right resources.
What is Pseudata?
Section titled “What is Pseudata?”Pseudata is a deterministic mock data generation library that produces identical data across all programming languages for the same seed. This cross-language consistency is achieved through:
- PCG32 algorithm: Deterministic pseudo-random number generation
- TypeSpec definitions: Single source of truth for data structures
- Fixture-based testing: Ensures outputs match exactly across languages
- Code generation: Automated SDK creation from TypeSpec
Pseudata initially launched with support for Go, Java, Python, and TypeScript. The roadmap includes expanding to nine programming languages (C#, PHP, Rust, Swift, and Dart), with community contributions welcome for additional language implementations.
Ways to Contribute
Section titled “Ways to Contribute”Choose the contribution type that matches what you want to add:
1. Add a New Model
Section titled “1. Add a New Model”What: Define new data structures (like Person, Product, Order) with their corresponding PseudoArrays.
When to use:
- You need a new type of test data (e.g.,
Invoice,Event) - You want to generate arrays of complex objects
- Your use case requires domain-specific data structures
Key skills: TypeSpec, understanding of code generation
→ Read the Add a New Model guide
2. Add a New Primitive
Section titled “2. Add a New Primitive”What: Implement fundamental data generation methods (like email(), hexColor(), phoneNumber()).
When to use:
- You need a new type of random value
- The primitive will be reused across multiple models
- You want to extend Pseudata’s base functionality
Key skills: Go, Java, Python, TypeScript, understanding of deterministic generation
Important: Must implement in all supported languages to maintain consistency.
→ Read the Add a New Primitive guide
3. Add a New Resource
Section titled “3. Add a New Resource”What: Add locale-specific data files (names, cities, words) for realistic test data.
When to use:
- You need vocabulary for text generation (nouns, adjectives)
- You’re implementing primitives that need locale-specific lists
Key skills: Data curation, optional primitive implementation
→ Read the Add a New Resource guide
4. Add a New Locale
Section titled “4. Add a New Locale”What: Add support for a new country or region (e.g., adding ja_JP for Japan).
When to use:
- You need culturally appropriate names and addresses for a new country
- You want to expand Pseudata’s international coverage
Key skills: Linguistic knowledge, data collection
→ Read the Add a New Locale guide
5. Add a New SDK
Section titled “5. Add a New SDK”What: Implement full Pseudata support for a new programming language (e.g., Rust, C#, PHP, Ruby).
When to use:
- Your team uses a language not yet supported
- You want to enable Pseudata for a major ecosystem
- You’re comfortable with language internals and tooling
Key skills: Deep knowledge of target language, TypeScript (for emitters), understanding of all Pseudata components
Important: This is a significant undertaking requiring implementation of Generator, emitters, primitives, and comprehensive testing.
→ Read the Add a New SDK guide
The Foundation: Fixture-Based Testing
Section titled “The Foundation: Fixture-Based Testing”All contributions must include fixture-based tests. This is not optional—it’s the foundation of Pseudata’s cross-language consistency guarantee.
Why Fixtures Matter
Section titled “Why Fixtures Matter”Without fixtures, you could have:
- Go generating
"John Smith"for seed 42 - Java generating
"John Smyth"due to a subtle bug - Each language passing its own tests, but cross-language compatibility broken
Fixtures provide a single source of truth: pre-generated test vectors that all languages must match exactly.
How It Works
Section titled “How It Works”- Go as Reference: Implement in Go first, run
go test -updateto generate fixture JSON - Cross-Language Verification: Other languages load the same fixtures and verify outputs match
- Continuous Validation: Any code change that breaks consistency fails tests immediately
What Fixtures Catch
Section titled “What Fixtures Catch”- RNG state bugs (calling
rng()multiple times incorrectly) - UTF-8 handling issues (byte vs. rune slicing)
- Type mapping errors (signed vs. unsigned integers)
- Resource access bugs (wrong locale, missing data)
- Formatting inconsistencies (date/time/phone patterns)
→ Read the detailed Testing Guide
Quick Start
Section titled “Quick Start”- Fork the repository: github.com/pseudata/pseudata
- Choose your contribution type: Follow one of the four guides above
- Implement your changes: Follow the step-by-step instructions
- Add fixture tests: Ensure cross-language consistency
- Run all tests: Verify nothing broke
- Submit a pull request: Include description and test results
Code Generation Architecture
Section titled “Code Generation Architecture”Understanding the code generation flow helps you work effectively with Pseudata:
Pseudata uses TypeSpec as the source of truth and generates code through multiple paths:
- Models: TypeSpec → JSON Schema → quicktype → language-specific classes
- Primitives: TypeSpec → interface-emitter.js → language-specific interfaces
- Arrays: TypeSpec → array-emitter.js → language-specific array classes
- Resources: Text files → resource-emitter.js → embedded language-specific data
Development Setup
Section titled “Development Setup”Pseudata uses Dev Containers to ensure all contributors have an identical development environment. This eliminates “works on my machine” issues and guarantees consistent builds and tests across all contributions.
Requirements
Section titled “Requirements”- Install Docker
- Install VS Code
- Install the Dev Containers extension
Note: While this guide documents VS Code, any dev container-compatible tool can be used. See the full list of supporting tools for alternatives like JetBrains IDEs, Neovim, and others.
# Clone the repositorygit clone https://github.com/pseudata/pseudata.gitcd pseudata
# Open in VS Codecode .When prompted, click “Reopen in Container” (or use Command Palette: “Dev Containers: Reopen in Container”)
The dev container includes everything pre-configured:
- Go, Java, Python, Node.js with correct versions
- All build tools and dependencies
- Task (task runner for build automation)
Working in the Container
Section titled “Working in the Container”# The monorepo structure:pseudata/ # Main implementation├── typespec/ # TypeSpec definitions│ ├── src/ # TypeSpec model definitions│ ├── lib/ # Custom emitters (interface, array, resource)│ └── resources/ # Locale-specific data files├── sdks/ # Language SDKs│ ├── go/│ ├── java/│ ├── python/│ └── typescript/└── fixtures/ # Test vectors
# Run code generationtask generate
# Run tests for each languagetask test
# See all available taskstask --listCode Standards
Section titled “Code Standards”Code Quality
Section titled “Code Quality”- Follow language conventions: PascalCase (Go), camelCase (Java/TS), snake_case (Python)
- Add documentation: Doc comments for all public APIs
- Handle UTF-8 correctly: Use rune/character operations, not byte slicing
- Match existing patterns: Study the codebase before implementing
- EditorConfig: The repository includes
.editorconfigfor consistent formatting (indentation, line endings, charset). Your editor will automatically apply these settings.
Documentation Standards
Section titled “Documentation Standards”Follow language-specific documentation conventions for all public APIs:
-
Go: Use GoDoc format
// Email generates a deterministic email address.// The format is lowercase-username@domain.tld.func (p *PrimitivesImpl) Email() string { -
Java: Use JavaDoc format
/*** Generates a deterministic email address.* The format is lowercase-username@domain.tld.* @return a valid email address string*/public String email() { -
Python: Use PEP 257 docstrings
def email(self) -> str:"""Generate a deterministic email address.The format is lowercase-username@domain.tld.Returns:str: A valid email address string""" -
TypeScript: Use TSDoc format
/*** Generates a deterministic email address.* The format is lowercase-username@domain.tld.* @returns A valid email address string*/email(): string {
Testing Requirements
Section titled “Testing Requirements”- Fixture tests are mandatory: All changes must include fixture-based tests
- Cross-language consistency: Verify outputs match across all languages
- Edge cases: Test boundary conditions, empty inputs, special characters
- Performance: Ensure generated data is fast ( for array access)
Commit Message Convention
Section titled “Commit Message Convention”Pseudata follows Conventional Commits to maintain a clear and structured commit history. This helps with automated changelog generation and makes it easier to understand what each commit does.
Format
Section titled “Format”<type>(<scope>): <subject>
[optional body]
[optional footer]- feat: A new feature
- fix: A bug fix
- docs: Documentation only changes
- style: Code style changes (formatting, missing semi-colons, etc.)
- refactor: Code change that neither fixes a bug nor adds a feature
- perf: Performance improvement
- test: Adding missing tests or correcting existing tests
- chore: Changes to the build process or auxiliary tools
Scopes
Section titled “Scopes”Use scopes to indicate which part of the codebase is affected:
- go, java, python, typescript, csharp, php, rust, swift, dart: Language-specific changes
- generator: Changes to the PCG32 generator
- primitives: Changes to primitive methods
- arrays: Changes to array classes
- resources: Changes to locale/resource data
- typespec: Changes to TypeSpec definitions
- emitter: Changes to code generation emitters
- fixtures: Changes to test fixtures
- docs: Documentation changes
- website: Changes to the documentation website
Examples
Section titled “Examples”# Adding a new featurefeat(primitives): add hexColor() method
# Fixing a bugfix(java): correct UUID generation for large seeds
# Updating documentationdocs(contributing): add conventional commits section
# Refactoring coderefactor(typescript): simplify resource loading logic
# Adding teststest(fixtures): add edge cases for digit() primitive
# Multiple scopesfeat(go,java,python,typescript): implement alnum() primitiveBreaking Changes
Section titled “Breaking Changes”If your commit introduces a breaking change, add BREAKING CHANGE: in the footer:
feat(primitives): rename int() to nextInt()
BREAKING CHANGE: int() method has been renamed to nextInt() for consistencyChangelog
Section titled “Changelog”Pseudata maintains a changelog following the Keep a Changelog format. The changelog provides a human-readable history of notable changes for each version.
When to Update the Changelog
Section titled “When to Update the Changelog”Update CHANGELOG.md when making user-facing changes:
- feat: Add entry under “Added” for new features
- fix: Add entry under “Fixed” for bug fixes
- BREAKING CHANGE: Add entry under “Changed” with clear migration guidance
- perf: Add entry under “Changed” for significant performance improvements
- deprecation: Add entry under “Deprecated” for features being phased out
Skip changelog updates for:
- Internal refactoring (
refactor) - Test changes (
test) - Documentation updates (
docs) - Build/tooling changes (
chore)
Changelog Format
Section titled “Changelog Format”# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
### Added- New `hexColor()` primitive for generating hex color codes- Support for Rust language SDK
### Changed- Improved performance of `digit()` by 40%
### Fixed- UTF-8 handling in `nickname()` for non-ASCII characters- UUID generation for large seed values in Java
### Deprecated- `emailVerified()` will be removed in v2.0.0, use `probability(0.9)` instead
## [1.0.0] - 2024-01-15
### Added- Initial release with Go, Java, Python, and TypeScript support...Best Practices
Section titled “Best Practices”- Be specific: Include method names, affected languages, and clear descriptions
- User-focused: Write for library users, not implementation details
- Link issues: Reference GitHub issues when applicable:
(#123) - Breaking changes first: Always list breaking changes at the top of a release section
- Group by language: For language-specific changes, group entries together
Example Entries
Section titled “Example Entries”### Added- `alnum()` primitive for alphanumeric strings (all languages)- Java: Added `BigInteger` support for large world seeds
### Fixed- Python: Fixed resource loading for Windows paths- TypeScript: Corrected type definitions for `PseudoArray`
### Changed- **BREAKING**: Renamed `int()` to `nextInt()` for API consistency (all languages) - Migration: Replace `p.int()` with `p.nextInt()`Versioning
Section titled “Versioning”Pseudata follows Semantic Versioning 2.0.0 (SemVer):
Given a version number MAJOR.MINOR.PATCH:
-
MAJOR (e.g., 1.0.0 → 2.0.0): Incompatible API changes
- Breaking changes to public APIs
- Removal of deprecated features
- Changes to deterministic output (same seed produces different data)
-
MINOR (e.g., 1.0.0 → 1.1.0): Backwards-compatible new functionality
- New primitives, models, or arrays
- New locale/resource support
- New language SDK implementations
- Performance improvements
-
PATCH (e.g., 1.0.0 → 1.0.1): Backwards-compatible bug fixes
- Bug fixes that don’t change the API
- Documentation updates
- Internal refactoring
Important: Because Pseudata is a deterministic data generation library, changes that alter the output for a given seed are considered BREAKING CHANGES, even if the API signature remains the same.
Pre-release Versions
Section titled “Pre-release Versions”- Alpha (
1.0.0-alpha.1): Early development, unstable - Beta (
1.0.0-beta.1): Feature-complete, testing phase - Release Candidate (
1.0.0-rc.1): Final testing before stable release
License
Section titled “License”Pseudata is licensed under the Apache License 2.0. By contributing, you agree that your contributions will be licensed under the same terms.
What this means for contributors:
- ✅ You retain copyright to your contributions
- ✅ You grant the project a perpetual, worldwide, non-exclusive license to use your code
- ✅ You confirm you have the right to submit the contribution
- ✅ Your contributions can be used commercially, modified, and distributed
- ✅ Patent grants are included (protection against patent claims)
Key requirements:
- Include the Apache 2.0 license header in new files (when applicable)
- Preserve existing copyright notices
- Document significant changes in NOTICE file (if required)
For the full license text, see LICENSE in the repository.
If you’re contributing on behalf of your employer, ensure you have permission to contribute code under this license.
Pull Request Process
Section titled “Pull Request Process”- Branch naming:
feature/model-invoice,fix/utf8-nickname,lang/rust - Commit messages: Follow Conventional Commits format (see above)
- Update CHANGELOG.md: Add entries for user-facing changes under
[Unreleased] - PR description: Explain what, why, how; include before/after examples
- Tests passing: All language tests must pass
- Documentation: Update relevant docs if adding new features
Getting Help
Section titled “Getting Help”- GitHub Issues: Report bugs or request features
- GitHub Discussions: Ask questions, share ideas
- Reference Implementations: Check existing Go/Java/Python/TypeScript code
- TypeSpec Docs: typespec.io for TypeSpec questions
Code of Conduct
Section titled “Code of Conduct”Pseudata is committed to providing a welcoming and inclusive environment for all contributors. Please read the Code of Conduct to understand the standards expected from everyone participating in the Pseudata community.
Recognition
Section titled “Recognition”Contributors are recognized in:
- Repository CONTRIBUTORS file
- Release notes for their contributions
- GitHub contributor graph
Thank you for helping make Pseudata better! 🎉
© 2025 Pseudata Project. Open Source under Apache License 2.0. · RSS Feed