unfoldingWord Bible Translation Resources Ecosystem: Developer Guide

Introduction

This technical guide provides complete documentation for the unfoldingWord Bible translation resource ecosystem - an interconnected system of open-source resources that enables Mother Tongue Translators to create Scripture translations in their heart languages.

Who This Helps:

Developers: Build Bible translation apps using APIs, word-level alignment, and cross-resource linking
Technical Decision Makers: Plan system integration, optimize performance, and design for multiple languages
Content Creators: Understand how translation resources connect and support quality workflows
Open Source Contributors: Follow technical standards, contribute new resources, and extend the ecosystem

Key Benefits: Saves development time with ready-to-use specifications, ensures new tools integrate seamlessly with existing apps, includes quality checks for reliable implementations, and simplifies working with interconnected Bible translation resources.

TLDR

unfoldingWord provides an interconnected ecosystem of open-source Bible translation resources designed to help Mother Tongue Translators create Scripture in their heart languages. The system centers around word-level alignment between original biblical texts and gateway language translations, with supporting resources that provide contextual guidance.

Key Components:

Source Texts: Hebrew Bible (UHB) and Greek New Testament (UGNT)
Gateway Translations: Literal (ULT/GLT) and Simplified (UST/GST) texts with word-level alignment to originals
Support Resources: Translation Notes (TN), Translation Words (TW), Translation Academy (TA), and Translation Questions (TQ)

For Developers:

All resources follow Resource Container (RC) specification with standardized manifest files.
Resources are hosted on Door43 Content Service with REST API access
Word alignment enables precise cross-resource navigation and features like word-level highlighting
System supports multiple gateway languages with parallel resource structures
Three main formats: USFM (Bible text), TSV (structured data), Markdown (articles)

Getting Started: Use the Catalog API to discover resources, load primary texts (ULT/UST), then add supporting resources based on manifest dependencies. The alignment layer connects everything together.

Overview

Mission & Core Concepts

unfoldingWord provides open-source Bible translation resources under CC BY-SA 4.0 licensing to enable Mother Tongue Translators (MTTs) to translate Scripture from Gateway Languages into their Heart Languages.

Resource Ecosystem Architecture

The translation resource ecosystem consists of three foundational layers:

Source Texts: Original language texts (Hebrew, Greek, Aramaic) with modern gateway language translations
Alignment Layer: Precise word-level connections between original and gateway languages
Support Resources: Contextual guidance, definitions, methodology, and quality assurance tools

Key Design Principles

Interconnectedness: All resources link together through standardized reference systems
Precision: Word-level alignment enables exact targeting of translation guidance
Extensibility: Resource Container specification allows new resource creation
Multilingual: Gateway language organizations can create parallel resource sets
Open Access: Creative Commons licensing ensures global accessibility

Resource Types & Functions

The unfoldingWord ecosystem includes the following resources for Bible translators:

Resource	Format	Purpose	Key Features
UHB	USFM	Original language texts (Hebrew)	Tokenized
UGNT	USFM	Original language texts (Greek)	Tokenized
ULT/GLT	USFM	Literal translation preserving grammatical and syntactic structures, idiomatic expressions and figures of speech of the original text as much as possible	Word-aligned, form-centric, tokenized
UST/GST	USFM	Simplified translation that expresses the meaning of the original text as clearly as possible when the forms in the GLT/ULT are not clear or natural in the target language	Word-aligned, meaning-based, tokenized
Translation Notes	TSV	Verse-specific translation guidance, grammar explanations, contextual explanations, and cultural background information	Linked to aligned words and Translation Academy articles
Translation Words	Markdown	Biblical term definitions	Consistent terminology
Translation Words Links	TSV	Internal linking between aligned words and Translation Words definitions by occurrence in verse	Precise occurrence tracking
Translation Questions	TSV	Verse-specific quality assurance questions	Community checking
Translation Academy	Markdown	Translation methodology, cultural issues, and quality standards	Training materials

For a more comprehensive description of the resources and their purpose, see the unfoldingWord for Translators page. This document is a technical specification of the resources and their interconnections meant to help developers understand the resources and how they work together.

Core Architecture

graph TB
    subgraph "Original Language Texts"
        UHB[Hebrew Bible - UHB]
        UGNT[Greek New Testament - UGNT]
    end
    
    subgraph "Alignment Layer" 
        WA[Word Alignment Data<br/>Original ↔ Gateway]
    end
    
    subgraph "Gateway Language Texts"
        ULT[ULT/GLT<br/>Literal Translation]
        UST[UST/GST<br/>Simplified Translation]
    end
    
    subgraph "Article-Based Resources"
        TW[Translation Words<br/>Term Definitions]
        TA[Translation Academy<br/>Methodology Training]
    end
    
    subgraph "Verse-Specific Resources"
        TN[Translation Notes<br/>Contextual Guidance]
        TWL[Translation Words Links<br/>Word Connections]
        TQ[Translation Questions<br/>Quality Assurance]
    end
    
    %% Foundation to Alignment

    WA --> UHB
    WA --> UGNT
    
    %% Alignment to Gateway Texts (bidirectional - alignment enables texts, texts verify alignment)
    WA --> ULT
    WA --> UST
    
    %% Alignment to Supporting Resources
    TN --> WA
    TWL --> WA
    
    %% Supporting Resource Interconnections
    TWL --> TW
    TN --> TA
    TQ --> ULT
    TQ --> UST
 
     %% Critical Core (Central Hub)
     style WA fill:#ffcccb,stroke:#333,stroke-width:3px
     
     %% Primary Gateway Texts (Essential for Translation)
     style ULT fill:#b3e5fc,stroke:#333,stroke-width:2px
     style UST fill:#b3e5fc,stroke:#333,stroke-width:2px
     
     %% Original Source Texts (Foundational Authority)
     style UHB fill:#e1bee7,stroke:#333,stroke-width:2px
     style UGNT fill:#e1bee7,stroke:#333,stroke-width:2px
     
     %% Verse-Specific Support (Targeted Guidance)
     style TN fill:#dcedc8,stroke:#333,stroke-width:1px
     style TWL fill:#dcedc8,stroke:#333,stroke-width:1px
     style TQ fill:#dcedc8,stroke:#333,stroke-width:1px
     
     %% Article-Based Reference (General Resources)
     style TW fill:#fff3cd,stroke:#333,stroke-width:1px
     style TA fill:#fff3cd,stroke:#333,stroke-width:1px

Technical Specifications

Resource Container (RC) Specification

Purpose & Structure

The Resource Container specification provides a standardized framework for organizing and connecting Bible resources. RC allows you to declare the file and directory structure of a resource and describe it comprehensively through a manifest file. This enables automatic discovery, validation, and interconnection of resources across the entire ecosystem.

Example: English ULT Resource Container

Here's what an actual Resource Container looks like in practice:

en_ult/
├── manifest.yaml             # Resource metadata and dependencies
├── LICENSE.md                # CC BY-SA 4.0 license
├── 01-GEN.usfm               # Genesis with word alignment data
├── 02-EXO.usfm               # Exodus with word alignment data
├── 03-LEV.usfm               # Leviticus with word alignment data
├── ...                       # All 66 books
├── 40-MAT.usfm               # Matthew with word alignment data
├── 41-MRK.usfm               # Mark with word alignment data
└── 66-REV.usfm               # Revelation with word alignment data

Key Files:

manifest.yaml - Declares resource identity, dependencies, and file structure:

dublin_core:
  conformsto: 'rc0.2'                    # RC specification version
  identifier: 'ult'                      # Unique resource identifier
  language: 
    identifier: 'en'                     # BCP 47 language code
    direction: 'ltr'                     # Text direction (ltr/rtl)
  subject: 'Bible'                       # Resource category for filtering
  type: 'bundle'                         # RC container type (bundle/help/dict/man)
  relation:                              # Array of related resources
    - 'en/tn'                            # Translation Notes (same language)
    - 'en/twl'                           # Translation Words Links
    - 'hbo/uhb'                          # Hebrew Bible source
    - 'el-x-koine/ugnt'                  # Greek New Testament source
    - 'en/ta'                            # Translation Academy
    - 'en/tw'                            # Translation Words
  version: '85'                          # Resource version number

checking:
  checking_level: '3'                    # Quality assurance level (1-3)

projects:                                # File mappings for each book
  - identifier: 'gen'                    # Book identifier (3-letter code)
    path: './01-GEN.usfm'                # Relative file path
    sort: 1                              # Display/processing order
    versification: 'kjv'                 # Versification system
  - identifier: 'exo'
    path: './02-EXO.usfm'
    sort: 2
    versification: 'kjv'
  # ... continues for all 66 books

01-GEN.usfm - USFM text with embedded word alignment:

\v 1 \zaln-s |x-strong="H07225" x-lemma="רֵאשִׁית" x-content="בְּרֵאשִׁית"\*\w In|x-occurrence="1"\w* \w the|x-occurrence="1"\w* \w beginning|x-occurrence="1"\w*\zaln-e\* \zaln-s |x-strong="H0430" x-lemma="אֱלֹהִים" x-content="אֱלֹהִים"\*\w God|x-occurrence="1"\w*\zaln-e\* \zaln-s |x-strong="H01254" x-lemma="בָּרָא" x-content="בָּרָא"\*\w created|x-occurrence="1"\w*\zaln-e\*...

Basic Requirements for Creating a Resource Container

To create a valid Resource Container, you need:

Manifest File (manifest.yaml) with required Dublin Core metadata
License File (LICENSE.md) - typically CC BY-SA 4.0 for unfoldingWord resources
Content Files in the appropriate format (USFM for texts, TSV for data, Markdown for articles)
Consistent Structure following RC directory patterns for your container type
Valid Identifiers using established conventions (language codes, book codes, etc.)

This standardized structure ensures that all unfoldingWord resources work together seamlessly, regardless of language or resource type.

Dublin Core: The manifest file uses Dublin Core metadata standards to describe the resource with standardized fields like identifier, language, subject, type, and relation. This ensures consistent metadata across all Resource Containers and enables automatic discovery and interconnection of resources.

RC Core Functions:

graph TD
    RC[Resource Container<br/>Specification]
    
    RC --> S[Standardized Structure<br/>Consistent directories & files]
    RC --> M[Metadata Framework<br/>Dublin Core manifest]
    RC --> L[Linking System<br/>URI-based references]
    RC --> C[Compatibility<br/>New resource integration]
    
    style RC fill:#fff2cc
    style S fill:#e8f5e8
    style M fill:#e8f5e8
    style L fill:#e8f5e8
    style C fill:#e8f5e8

Core RC Principles

Standardized Metadata: Every resource has Dublin Core metadata in manifest.yaml
Predictable Structure: Consistent directory layout and file naming
Cross-Resource Linking: URI-based references enable automatic navigation
Version Management: Built-in versioning and dependency tracking
Extensibility: Framework supports new resource types and subjects
Platform Agnostic: Works with any documents hosting platform that supports distributed version control and API access

How RC Provides Structure and Interconnection

1. Resource Structure Standardization: RC defines consistent directory layouts and file naming patterns:

# All resources follow same manifest structure
dublin_core:
  identifier: 'resource-name'
  type: 'bundle' | 'help' | 'dict' | 'man'
  relation: ['connected/resources']

2. Interconnection Through Relations: Resources declare their dependencies and connections:

# en_ult manifest declares connections to:
relation:
  - en/tn    # Translation Notes
  - en/twl   # Translation Words Links  
  - hbo/uhb  # Hebrew Bible source

3. Cross-Resource Navigation: RC links provide standardized addressing:

rc://en/tn/help/gen/01/02    # Translation Note for Genesis 1:2
rc://en/tw/dict/bible/kt/god # Translation Words entry for "god"

4. New Resource Compatibility: Following RC structure ensures new resources integrate automatically:

# New commentary resource
dublin_core:
  identifier: 'commentary'
  type: 'help'
  relation: ['en/ult', 'en/ust']  # Links to existing resources

5. Multi-Language Consistency: RC enables parallel resource structures across gateway languages:

en_ult   → es-419_glt  → fr_glt     # Same container type
en_tn    → es-419_tn   → fr_tn      # Same TSV structure
en_tw    → es-419_tw   → fr_tw      # Same markdown format

Practical Benefits:

Structure: Predictable file organization across all resources
Compatibility: New resources automatically work with existing tools
Interconnection: Standardized linking enables seamless navigation
Quality Control: Metadata provides versioning and quality tracking

RC Container Types & File Patterns

Type	Structure	Content Pattern	Example
`bundle`	Flat USFM files	`{NN}-{BOOK}.usfm`	`01-GEN.usfm`, `40-MAT.usfm`
`help`	Flat TSV files	`{prefix}_{BOOK}.tsv`	`tn_GEN.tsv`, `tq_MAT.tsv`
`dict`	Nested Markdown	`bible/{type}/{term}.md`	`bible/kt/compassion.md`, `bible/names/aaron.md`
`man`	Nested chapters	`{category}/{topic}/{NN}.md`	`translate/figs-metaphor/01.md`, `intro/translation-guidelines/01.md`

Manifest Structure (Key Fields)

dublin_core:
  conformsto: 'rc0.2'                    # RC specification version
  identifier: 'ult'                      # Resource identifier
  language:
    identifier: 'en'                     # BCP 47 language code
    direction: 'ltr'                     # Text direction (ltr/rtl)
  relation:                              # Dependencies array
    - 'en/tn'                           # Same language resources
    - 'hbo/uhb?v=2.1.30'               # Source text with version
  subject: 'Bible'                       # Resource subject for filtering
  type: 'bundle'                         # RC container type
  version: '85'                          # Resource version number

checking:
  checking_level: '3'                    # Quality level (1-3)

projects:                                # Book definitions
  - identifier: 'gen'                    # Book identifier
    path: './01-GEN.usfm'               # File path
    sort: 1                             # Sort order
    versification: 'kjv'                # Versification system

For more information on the Resource Container specification, see the Resource Container specification page.

RC Linking System

Link Format: rc://language/resource/type/project/chapter/chunk

Examples:

rc://en/ult/book/gen/01/02 - ULT Genesis 1:2
rc://en/tn/help/gen/01/02 - Translation Note for Genesis 1:2
rc://en/tw/dict/bible/kt/god - Translation Words entry for "god"
rc://*/ta/man/translate/figs-metaphor - Translation Academy article (* wildcard for any language)

Hosting Resource Containers

unfoldingWord resources are currently hosted on the Door43 Content Service platform. This platform is a fork of Gitea (an open source Git server like GitHub) and provides a robust infrastructure for hosting and managing resources, including:

Version control
Collaboration (forking, pull requests, etc.)
API access
Resource discovery

Although any other platform that supports these features can host resource containers, this guide will focus on the Door43 Content Service and how unfoldingWord RCs are hosted and accessed in the Door43 ecosystem.

The DCS hosts unfoldingWord resources as git repositories using three organizational models:

1. Single-Language Organizations: Most gateway language communities create dedicated organizations (e.g., es-419_gl for Spanish, fr_gl for French) that contain all resource repositories for their specific language. This enables focused collaboration within language communities and clear ownership models.

2. Multi-Language Organizations: The core unfoldingWord organization hosts resources across multiple languages, including the original English resources and some community contributions in other languages, providing centralized access to foundational resources.

3. Individual User Repositories: Users can host resources under their personal accounts for private development, experimentation, or small group collaboration, though this limits broader community collaboration and discoverability.

This flexible organizational structure supports both large-scale community collaboration and individual development workflows, while maintaining clear resource discovery patterns for developers.

You can find the unfoldingWord organization containing all its resources at https://git.door43.org/unfoldingWord.

API Reference

Platform Overview

The Door43 Content Service provides a comprehensive REST API for managing and accessing unfoldingWord resources. This Git-based platform (fork of Gitea) enables both resource discovery through enhanced catalog endpoints and direct repository access through standard Git API patterns.

Primary Platform: Door43 Content Service (https://git.door43.org/)

API Documentation: https://git.door43.org/api/swagger
Base URL: https://git.door43.org/api/v1
API Version: v1 (current)
Response Format: JSON

Catalog API Endpoints

The Catalog API provides enhanced resource discovery with intelligent filtering and relationship resolution.

Resource Discovery:

GET /api/v1/catalog/search
Query Parameters:
  - lang: Language code (en, es-419, fr, etc.)
  - subject: Resource subject (Bible, Translation Notes, etc.)
  - stage: Release stage (prod, preprod, draft)
  - owner: Organization name
  - repo: Repository name
  - tag: Specific version tag

Response: Array of catalog entries with metadata and relationships

Organization & Language Listing:

GET /api/v1/catalog/list/owners
Response: Array of organization objects with resource counts

GET /api/v1/catalog/list/languages  
Response: Array of language objects with available resources

GET /api/v1/catalog/list/subjects
Response: Array of subject categories with resource counts

Individual Resource Access:

GET /api/v1/catalog/entry/{owner}/{repo}/{ref}
Query Parameters:
  - lang: Filter by language (optional)
  
Response: Complete catalog entry with manifest data and file listings

Repository API Endpoints

Standard Gitea-compatible endpoints for direct repository manipulation and content access.

Repository Metadata:

GET /api/v1/repos/{owner}/{repo}
Response: Repository object with metadata, clone URLs, and statistics

GET /api/v1/repos/{owner}/{repo}/branches
Response: Array of branch objects with commit information

GET /api/v1/repos/{owner}/{repo}/tags
Response: Array of tag objects with release information

Content Access:

GET /api/v1/repos/{owner}/{repo}/contents/{filepath}
Query Parameters:
  - ref: Branch or tag name (default: master)
  
Response: File content object with base64 encoding or download URL

GET /api/v1/repos/{owner}/{repo}/archive/{ref}.{format}
Formats: zip, tar.gz
Response: Binary archive download

Release Management:

GET /api/v1/repos/{owner}/{repo}/releases
Response: Array of release objects with assets and metadata

GET /api/v1/repos/{owner}/{repo}/releases/{id}
Response: Individual release object with detailed information

Authentication & Access Control

Authentication Methods:

Most unfoldingWord resources are publicly accessible, but authentication provides enhanced capabilities:

API Token Authentication:

Authorization: token <API_TOKEN>
Content-Type: application/json

Benefits of Authentication:

Increased rate limits (1000+ requests/hour vs 60/hour)
Access to private repositories
CRUD operations on resources
Enhanced catalog filtering options
Priority access during high traffic

Token Generation:

Create account on Door43 Content Service
Use the API to generate a token (/api/v1/users/{username}/tokens)
Use the token to authenticate your requests

Rate Limiting & Performance

Rate Limit Headers:

X-RateLimit-Limit: 60        # Maximum requests per hour
X-RateLimit-Remaining: 45     # Remaining requests in current window
X-RateLimit-Reset: 1640995200 # Reset time (Unix timestamp)

Performance Optimization Strategies:

Caching Strategy:

Cache manifest files for dependency resolution
Store frequently accessed content locally
Implement intelligent cache invalidation based on version tags
Use ETags for conditional requests

Batch Operations:

Download complete resource archives for offline usage
Use catalog API for bulk discovery operations
Implement parallel requests for independent resources
Combine multiple file requests into archive downloads

Error Resilience:

Implement exponential backoff for 429 (rate limit) responses
Handle network timeouts with retry logic
Gracefully degrade when optional resources are unavailable
Provide fallback mechanisms for core functionality

Repository Organization

Organizational Models

The Door43 Content Service supports multiple organizational patterns to accommodate different community structures and collaboration models:

1. Single-Language Organizations:

The most common pattern where gateway language communities create dedicated organizations containing all resources for their specific language:

Organization: es-419_gl (Spanish Gateway Language)
├── es-419_glt      # Spanish Literal Translation
├── es-419_gst      # Spanish Simplified Translation  
├── es-419_tn       # Spanish Translation Notes
├── es-419_tw       # Spanish Translation Words
├── es-419_twl      # Spanish Translation Words Links
├── es-419_tq       # Spanish Translation Questions
└── es-419_ta       # Spanish Translation Academy

Benefits:

Centralized language-specific collaboration
Clear ownership and governance model
Simplified discovery for language communities
Consistent quality control within language group

2. Multi-Language Organizations:

Organizations hosting resources across multiple languages, typically for foundational or cross-community resources:

Organization: unfoldingWord (Multi-Language)
├── en_ult          # English Literal Translation
├── en_ust          # English Simplified Translation
├── hbo_uhb         # Hebrew Bible (original language)
├── el-x-koine_ugnt # Greek New Testament (original language)
├── fr_ult          # French Literal Translation
└── hi_ult          # Hindi Literal Translation

Use Cases:

Original language texts (Hebrew, Greek)
Reference implementations for new languages
Cross-community resource standards
Experimental or prototype resources

3. Individual User Repositories:

Personal accounts hosting resources for development, testing, or specialized purposes:

User: translator_john
├── hi_ult_draft    # Hindi translation in development
├── hi_tn_personal  # Personal translation notes
└── test_resources  # Experimental resource formats

Repository Naming Conventions

Standard Naming Pattern: {language-code}_{resource-identifier}

Language Code Standards:

BCP 47 Compliance: Use standard language tags (e.g., en, es-419, fr, hi)
Script Variants: Include script codes when necessary (e.g., ur-Arab, hi-Deva)
Regional Variants: Specify regions for localized versions (e.g., es-419 for Latin American Spanish)
Original Languages: Use scholarly conventions (hbo for Biblical Hebrew, el-x-koine for Koine Greek)

Resource Identifier Conventions:

Resource Type	Identifier	Example Repository
Literal Translation	`ult` or `glt`	`en_ult`, `es-419_glt`
Simplified Translation	`ust` or `gst`	`en_ust`, `fr_gst`
Translation Notes	`tn`	`en_tn`, `hi_tn`
Translation Words	`tw`	`en_tw`, `ur_tw`
Translation Words Links	`twl`	`en_twl`, `es-419_twl`
Translation Questions	`tq`	`en_tq`, `fr_tq`
Translation Academy	`ta`	`en_ta`, `hi_ta`
Hebrew Bible	`uhb`	`hbo_uhb`
Greek New Testament	`ugnt`	`el-x-koine_ugnt`

Repository Structure

Release Management:

Tagging Strategy:
├── v1.0.0          # Major release with breaking changes
├── v1.1.0          # Minor release with new content
├── v1.1.1          # Patch release with bug fixes
├── latest          # Points to current stable release
└── production      # Production deployment tag

Discovery Patterns

Hierarchical Resource Discovery:

Discovery Flow:
1. Organization Discovery
   GET /api/v1/catalog/list/owners
   
2. Repository Enumeration  
   GET /api/v1/repos?org={organization}
   
3. Resource Classification
   Parse repository names for language/resource patterns
   
4. Dependency Resolution
   GET /api/v1/repos/{owner}/{repo}/contents/manifest.yaml

Language-Based Filtering:

Filter Strategies:
1. Language Code Matching: Extract language from repository names
2. Manifest Language Field: Check dublin_core.language.identifier  
3. Resource Type Detection: Parse resource identifier patterns
4. Cross-Reference Validation: Verify language consistency across dependencies

Subject-Based Filtering:

Subject-based filtering uses the dublin_core.subject field in manifest files to categorize and filter resources by their content type and purpose.

Standard Subject Categories:

Subject	Description	Example Resources
`Bible`	Original language biblical texts or other unaligned texts	`hbo_uhb`, `el-x-koine_ugnt`
`Aligned Bible`	Gateway language texts with word alignment	`en_ult`, `en_ust`, `fr_glt`
`Translation Notes`	Verse-specific translation guidance	`en_tn`, `es-419_tn`
`Translation Words`	Biblical term definitions and explanations	`en_tw`, `hi_tw`
`Translation Words Links`	Cross-reference links between texts and definitions	`en_twl`, `fr_twl`
`Translation Questions`	Quality assurance questions for checking	`en_tq`, `ur_tq`
`Translation Academy`	Training and methodology resources	`en_ta`, `es-419_ta`

Subject-Based Discovery Workflow:

Resource Discovery by Subject:
1. Catalog Query by Subject
   GET /api/v1/catalog/search?subject=Bible&stage=prod
   
2. Subject-Specific Filtering
   Filter results by exact subject match or category grouping
   
3. Quality Level Filtering
   Further filter by checking_level (1-3) within subject category
   
4. Language Cross-Reference
   Find parallel resources in same subject across languages

Subject Hierarchy & Relationships:

Subject Dependency Patterns:
├── Bible (Source Texts)
│   └── Aligned Bible (Gateway Translations)
│       ├── Translation Notes (Contextual Guidance)
│       ├── Translation Words Links (Term References)
│       └── Translation Questions (Quality Checks)
├── Translation Words (Term Definitions)
│   └── Referenced by Translation Words Links
└── Translation Academy (Methodology)
    └── Referenced by Translation Notes

Advanced Subject Filtering:

Complex Subject Queries:
1. Multi-Subject Filtering
   Find resources spanning multiple subjects (e.g., Bible + Aligned Bible)
   
2. Subject Exclusion
   Filter out specific subject categories (e.g., exclude drafts)
   
3. Subject-Language Intersection
   Find all resources of specific subject in target language
   
4. Dependency-Aware Subject Loading
   Load complete subject chains (e.g., Bible → Aligned Bible → Translation Notes)

Implementation Patterns:

Subject Filter Implementation:
1. Manifest Parsing: Extract subject from dublin_core.subject field
2. Category Mapping: Map subjects to application feature sets
3. Progressive Loading: Load core subjects first, enhancement subjects later
4. User Preferences: Allow users to select preferred subject categories
5. Offline Prioritization: Cache high-priority subjects for offline access

Access Patterns & Performance

Efficient Repository Access:

Access Pattern	Use Case	Performance Consideration
Single Resource	Load specific translation	Direct repository access
Language Set	Load all resources for language	Parallel repository queries
Cross-Language	Compare translations	Batch manifest loading
Dependency Chain	Follow resource relations	Recursive dependency resolution

Caching Strategies:

Repository Cache Layers:
1. Organization Metadata: Cache organization listings and descriptions
2. Repository Index: Cache repository names and basic metadata  
3. Manifest Cache: Cache parsed manifest files with TTL
4. Content Cache: Cache frequently accessed resource content
5. Dependency Graph: Cache resolved dependency relationships

Repository Relationships

Direct Dependencies:

# Example: en_ult manifest.yaml
relation:
  - hbo/uhb           # Source text dependency
  - el-x-koine/ugnt   # Source text dependency  
  - en/ust            # Parallel translation

Implicit Relationships:

Resource Ecosystem Connections:
├── Source Texts (UHB/UGNT)
│   └── Gateway Translations (ULT/UST)
│       ├── Translation Notes (TN)
│       ├── Translation Words Links (TWL)
│       └── Translation Questions (TQ)
├── Reference Materials
│   ├── Translation Words (TW) ← Referenced by TWL
│   └── Translation Academy (TA) ← Referenced by TN
└── Cross-Language Parallels
    └── Same resource type across languages

Quality Control & Governance

Repository-Level Quality Gates:

Quality Check	Implementation	Enforcement Level
Naming Compliance	Automated validation on creation	Mandatory
Manifest Schema	CI/CD pipeline validation	Mandatory
Content Format	Format-specific validators	Recommended
Cross-Reference	Dependency resolution testing	Recommended
Version Consistency	Compatibility matrix checking	Advisory

Governance Models:

Permission Structures:
1. Organization Admins: Full access to all repositories in organization
2. Repository Maintainers: Write access to specific repositories
3. Contributors: Submit changes via pull requests
4. Readers: Public read access to published resources

Migration & Reorganization

Repository Movement Patterns:

Common Migration Scenarios:
1. User → Organization: Personal project becomes community resource
2. Organization → Organization: Language community reorganization  
3. Multi-Language → Single-Language: Resource localization
4. Repository Splitting: Large repository divided by book/resource type
5. Repository Merging: Multiple related repositories consolidated

URL Preservation:

Redirect Strategies:
1. GitHub-style redirects for moved repositories
2. Catalog API redirect responses for deprecated locations
3. Manifest relation updates for dependency redirection
4. API versioning to maintain backward compatibility

Best Practices

Repository Creation Guidelines:

Consistent Naming: Follow established language code and resource identifier patterns
Complete Manifests: Include all required Dublin Core metadata fields
Clear Dependencies: Explicitly declare all resource relationships
Version Tagging: Use semantic versioning for all releases
Documentation: Provide README files with resource-specific guidance

Organizational Strategies:

Language Focus: Prefer single-language organizations for community management
Resource Grouping: Keep related resources in same organization when possible
Access Management: Implement appropriate permission levels for collaboration
Quality Standards: Establish consistent quality gates across repositories
Discovery Optimization: Structure organizations for efficient resource discovery

Integration Strategy

graph TD
    A[Resource Discovery<br/>via Catalog API] --> B[Load Primary Resource<br/>ULT/GLT or UST/GST]
    B --> C[Parse Manifest<br/>Relations]
    C --> D[Filter Dependencies<br/>by App Requirements]
    D --> E[Load Dependencies<br/>in Parallel]
    E --> F[Build Dependency<br/>Graph]
    F --> G[Enable Cross-Resource<br/>Navigation]
    
    style A fill:#fff2cc
    style G fill:#e8f5e8

Loading Patterns

Phase 1: Basic text display (ULT/GLT + UST/GST) Phase 2: Enhanced navigation (versification, cross-references) Phase 3: Supporting resources (TN, TW, TA integration) Phase 4: Advanced features (word-level highlighting, quality assurance)

Dependency Resolution

Manifest-Driven Loading:

# en_ult/manifest.yaml
relation:
  - en/tn           # Translation Notes
  - en/twl          # Translation Words Links
  - hbo/uhb         # Hebrew Bible source
  - el-x-koine/ugnt # Greek New Testament source

Filtering Strategies:

Subject-based: Filter by resource subject field in manifest
Identifier-based: Filter by resource identifier patterns (tn, tw, ta, etc.)
Scope control: Minimal/Extended/Full dependency loading
Quality-based: Filter by checking_level (1-3)

Resource formats

There are three main resource formats used in unfoldingWord resources, USFM, TSV, and Markdown.

USFM

USFM is a markup language for Bible text. It is a subset of the USFM 3.1 specification and is used to represent the original language text of the Bible.

A usfm file should have the .usfm extension. e.g. 01-GEN.usfm

Example:

A verse like this:

1

The creation

1 In the beginning God created the heavens and the earth. 2 And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters.

Would be represented like this in a usfm file:

\c 1
\s The creation
\p
\v 1 In the beginning God created the heavens and the earth. \v 2 And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters.

where:

\c represents a chapter number
\s represents a section title
\p represents a paragraph
\v represents a verse number

TSV

TSV is a tab-separated values file format used to represent translation notes, translation words links, and translation questions.

A tsv file should have the .tsv extension. e.g. tn_GEN.tsv

Example of a translation notes TSV file:

a table like this:

Reference	ID	Tags	SupportReference	Quote	Occurrence	Note
1:1	abc1	grammar	rc://en/ta/man/translate/figs-metaphor	δοῦλος	1	Paul calls himself a slave, which indicates...
1:2	abc2	culture	rc://en/ta/man/translate/translate-names	Ἰησοῦ	1	This is the name of the Son of God...

would be represented like this in a TSV file:

Reference ID Tags SupportReference Quote Occurrence Note
1:1 abc1 grammar rc://en/ta/man/translate/figs-metaphor δοῦλος 1 Paul calls himself a slave, which indicates...
1:2 abc2 culture rc://en/ta/man/translate/translate-names Ἰησοῦ 1 This is the name of the Son of God...

where:

Reference is the reference to the original text
ID is a unique identifier for the note
Tags are the tags for the note
SupportReference is the reference to the support text
Quote is the quote from the original text
Occurrence is the occurrence of the quote in the original text
Note is the content of the translation note which is a markdown formatted string

Markdown

Markdown is a lightweight markup language for creating formatted text. It is used to add formatting to the content of all unfoldingWord resources.

A markdown file should have the .md extension. e.g. bc_GEN.md

Example of a translation Academy article Markdown file:

# Abstract Nouns
## What are abstract nouns and how do I deal with them in my translation?

### Description

Abstract nouns are nouns that refer to attitudes, qualities, events, or situations.
...

### Reason This Is a Translation Issue

The Bible that you translate from may use abstract nouns to express certain ideas. Your language might not use abstract nouns for some of those ideas.
...

### Examples From the Bible

> From **childhood** you have known the sacred writings … (2 Timothy 3:15a ULT)

The abstract noun “childhood” refers to when someone was a child.
...

### Translation Strategies

If an abstract noun would be natural and give the right meaning in your language, consider using it. If not, here is another option:

1. ...

### Examples of Translation Strategies Applied

1. ...

> … from **childhood** you have known the sacred writings … (2 Timothy 3:15a ULT)
>
> > Ever since **you were a child** you have known the sacred writings.
...

unfoldingWord Resource Containers and their relationships

1. UHB (Hebrew Bible)

format: usfm
language: he
subject: Bible
type: bundle (RC)
identifier: uhb
relation: en/ult,
link: rc://en/uhb/book/gen/01/02
content:
- 01-GEN.usfm
- 02-EXO.usfm
- 03-LEV.usfm
- ...
- 66-REV.usfm

See: https://git.door43.org/unfoldingWord/hbo_uhb

2. UGNT (Greek New Testament)

format: usfm
language: el
subject: Bible
type: bundle (RC)
identifier: ugnt
relation: en/ult
link: rc://en/ugnt/book/gen/01/02
content:
- 01-GEN.usfm
- 02-EXO.usfm
- 03-LEV.usfm
- ...
- 66-REV.usfm

See: https://git.door43.org/unfoldingWord/el-x-koine_ugnt

3. ULT (Literal Translation for Translators)

format: usfm
language: en
subject: Aligned Bible
type: bundle (RC)
identifier: ult
relation: hbo/uhb, el-x-koine/ugnt, en/ust
link: rc://en/ult/book/gen/01/02
content:
- 01-GEN.usfm
- 02-EXO.usfm
- 03-LEV.usfm
- ...
- 66-REV.usfm

See: https://git.door43.org/unfoldingWord/en_ult

4. UST (Simplified Translation for Translators)

format: usfm
language: en
subject: Aligned Bible
type: bundle (RC)
identifier: ust
relation: en/ult, el-x-koine/ugnt, hbo/uhb
link: rc://en/ust/book/gen/01/02
content:
- 01-GEN.usfm
- 02-EXO.usfm
- 03-LEV.usfm
- ...
- 66-REV.usfm

See: https://git.door43.org/unfoldingWord/en_ust

5. Translation Notes

format: tsv
language: en
subject: Translation Notes
type: help (RC)
identifier: tn
relation: en/ult, en/ust, el-x-koine/ugnt, hbo/uhb, en/ta
link: rc://en/tn/help/gen/01/02
TSV columns:
- Reference - the reference to the original text (e.g. 1:1-2)
- ID - a unique identifier for the note (e.g. abc1)
- Tags - the tags for the note (e.g. grammar, culture)
- SupportReference - the reference to the support text (e.g. rc://en/ta/man/translate/figs-metaphor)
- Quote - the quote from the original text (e.g. δοῦλος)
- Occurrence - the occurrence of the quote in the original text (e.g. 1)
- Note - the content of the translation note which is a markdown formatted string
content:
- tn_GEN.tsv
- tn_EXO.tsv
- tn_LEV.tsv
- ...
- tn_REV.tsv

The Quote, Occurrence, and Reference columns allows the note to be linked to the alignment layer, which in turn allows the note to reference the exact quote words in the original text and any other Bible resource that is aligned to it. This is the key to features like word-level highlighting where the user can clearly see the exact words that a note is referencing across multiple aligned texts.

The SupportReference column allows the note to reference the translationAcademy article that gives a more detailed explanation of the translation issue the note is referencing. For example, if the note is about the translation of the word "slave", the SupportReference would be rc://en/ta/man/translate/figs-metaphor, a tA article that explains how to translate metaphors.

An example of the content of a tn entry:

Reference	ID	Tags	SupportReference	Quote	Occurrence	Note
1:1	abc1	grammar	rc://en/ta/man/translate/figs-metaphor	δοῦλος	1	Here Paul calls himself a slave. This is a metaphor for a person who serves another person. Alternate translation: "A servant".

See: https://git.door43.org/unfoldingWord/en_tn

6. Translation Academy

format: md
language: en
subject: Translation Academy
type: man (RC)
identifier: ta
relation: en/tn, en/ult, en/ust, el-x-koine/ugnt, hbo/uhb
link: rc://en/ta/man/translate/figs-metaphor
content:
- checking/
  - ...
- man/
  - ...
- translate/
  - figs-metaphor/
    - title.md
    - subtitle.md
    - 01.md
  - translate-names/
    - ...
    - 01.md
  - ...

Academy articles are splitted into three files:

title.md - the title of the article
subtitle.md - the subtitle of the article
01.md - the content of the article

The content of the articles in the translate folder mostly adheres to the following structure:

# Title
## Subtitle

### Description
...
### Reason This Is a Translation Issue
...
### Examples From the Bible
...
### Translation Strategies
...
### Examples of Translation Strategies Applied
...

See: https://git.door43.org/unfoldingWord/en_ta

7. Translation Words

format: md
language: en
subject: Translation Words
type: dict (RC)
identifier: tw
relation: en/ult, en/ust, en/twl, el-x-koine/ugnt, hbo/uhb
link: rc://en/tw/dict/bible/other/servant
content:
- bible/
  - names/
    - aaron.md
    - ...
    - zechariah.md
  - kt/
    - apostle.md
    - ...
    - zion.md
  - other/
    - ...
    - bread.md

The content of a word article in the bible folder is a markdown file that contains the definition of the word. They mostly adhere to the following structure:

# word

## Definition
...

## Translation Suggestions
...

## Examples from the Bible stories
...

## Bible References
...

## Word Data
...

See: https://git.door43.org/unfoldingWord/en_tw

8. Translation Words Links

format: tsv
language: en
subject: Translation Words Links
type: help (RC)
identifier: twl
relation: en/ult, en/ust, en/tw, el-x-koine/ugnt, hbo/uhb
link: rc://en/twl/help/gen/01/02
TSV columns:
- Reference - the reference to the original text (e.g. 1:1-2)
- ID - a unique identifier for the note (e.g. xyz1)
- Tags - the tags for the note (kt, names, other)
- OrigWords - the original words that the note is referencing (e.g. δοῦλος)
- Occurrence - the occurrence of the quote in the original text (e.g. 1)
- TWLink - the link to the translation word (e.g. rc://*/tw/dict/bible/other/servant)
content:
- twl_GEN.tsv
- twl_EXO.tsv
- twl_LEV.tsv
- ...
- twl_REV.tsv

The TWLink column allows the TWL entry to reference the translation word that is related to the original word.

The OrigWords, Occurrence, and Reference columns allows the TWL entry to be linked to the alignment layer, which in turn allows the TWL entry to reference the exact quote words in the original text and any other Bible resource that is aligned to it. This is the key to features like word-level highlighting where the user can clearly see the exact words that a note is referencing across multiple aligned texts.

See: https://git.door43.org/unfoldingWord/en_twl

9. Translation Questions

format: tsv
language: en
subject: Translation Questions
type: help (RC)
identifier: tq
relation: en/ult, en/ust
link: rc://en/tq/help/gen/01/02
TSV columns:
- Reference - the reference to the original text (e.g. 1:1-2)
- ID - a unique identifier for the tQ entry (e.g. xyz1)
- Tags - the tags for the tQ entry
- Quote - It can be used to link the question to the original text (mostly not used)
- Occurrence - the occurrence of the quote in the original text
- Question - the question that the tQ entry is referencing
- Response - the response to the question
content:
- tq_GEN.tsv
- tq_EXO.tsv
- tq_LEV.tsv
- ...
- tq_REV.tsv

See: https://git.door43.org/unfoldingWord/en_tq

Alignment Layer

The alignment layer is a key component of the unfoldingWord resource system. It is a layer that allows the resources to be linked to the original text and other Bible translations. Currently alignment data is embedded in the gateway language USFM files, which are tokenized into words.

Alignment Syntax

unfoldingWord uses custom USFM markers for the alignment syntax. Which allows for precise word-level connections between gateway language translations and original Hebrew, Greek, and Aramaic texts.

USFM Alignment Syntax Structure

\zaln-s |x-strong="G35880" x-lemma="ὁ" x-morph="Gr,EA,,,,NMS," x-occurrence="1" x-occurrences="1" x-content="ὁ"\*\w The|x-occurrence="1" x-occurrences="1"\w*\zaln-e\*

where:

\zaln-s - start of alignment pair
- |x-strong="G35880" - Strong's concordance number (G35880, H01234)
- |x-lemma="ὁ" - Dictionary form of the original word (ὁ, אֱלֹהִים)
- |x-morph="Gr,EA,,,,NMS," - Morphological parsing (Gr,EA,,,,NMS,)
- |x-occurrence="1" - Which occurrence in the verse (1, 2, 3...)
- |x-occurrences="1" - Total occurrences in the verse
- |x-content="ὁ" - Actual original language text being aligned
\* - end of alignment start marker
\w - word marker
- The - the word being aligned
- |x-occurrence="1" x-occurrences="1" - occurrence information
\w* - end of word marker
\zaln-e\* - end of alignment pair

Alignment Relationship Types

1. One-to-One: Single original word ↔ single gateway word

\zaln-s |x-strong="G2316" x-lemma="θεός" x-morph="Gr,N,,,,,NMS," x-occurrence="1" x-occurrences="1" x-content="θεὸς"\*\w God|x-occurrence="1" x-occurrences="1"\w*\zaln-e\*

graph LR
    subgraph Greek ["Greek Word"]
        A["θεὸς"]
    end
    subgraph English ["English Word"]
        B["God"]
    end
    Greek --> English
    style Greek fill:#e1f5fe
    style English fill:#f3e5f5
    style A fill:#ffffff
    style B fill:#ffffff

2. One-to-Many: Single original word ↔ multiple gateway words (nested structure)

\zaln-s |x-strong="G2980" x-lemma="λαλέω" x-morph="Gr,V,IFA1,,P," x-occurrence="1" x-occurrences="1" x-content="λαλήσομεν"\*\w we|x-occurrence="1" x-occurrences="1"\w* \w will|x-occurrence="1" x-occurrences="1"\w* \w speak|x-occurrence="1" x-occurrences="1"\w*\zaln-e\*

Here, the single Greek verb "λαλήσομεν" (we will speak) requires three English words to express the same meaning. All three gateway words are contained within one alignment pair because they all translate the single original word.

graph LR
    subgraph Greek ["Greek Word"]
        A["λαλήσομεν"]
    end
    subgraph English ["English Words (Nested Together)"]
        B["we"]
        C["will"]
        D["speak"]
    end
    Greek --> English
    style Greek fill:#e1f5fe
    style English fill:#f3e5f5
    style A fill:#ffffff
    style B fill:#ffffff
    style C fill:#ffffff
    style D fill:#ffffff

3. Many-to-One: Multiple original words ↔ single gateway word (overlapping alignments)

\zaln-s |x-strong="G1223" x-lemma="διά" x-morph="Gr,P,,,,,G,,," x-occurrence="1" x-occurrences="1" x-content="διὰ"\*\zaln-s |x-strong="G5124" x-lemma="οὗτος" x-morph="Gr,RD,,,,ANS," x-occurrence="1" x-occurrences="1" x-content="τοῦτο"\*\w therefore|x-occurrence="1" x-occurrences="1"\w*\zaln-e\*\zaln-e\*

This example shows two Greek words "διὰ τοῦτο" (literally "through this") being translated as the single English word "therefore". The alignment markers are nested, with both original words wrapping around the single gateway word that captures their combined meaning.

graph LR
    subgraph Greek ["Greek Words (Nested Together)"]
        A["διὰ"]
        B["τοῦτο"]
    end
    subgraph English ["English Word"]
        C["therefore"]
    end
    Greek --> English
    style Greek fill:#e1f5fe
    style English fill:#f3e5f5
    style A fill:#ffffff
    style B fill:#ffffff
    style C fill:#ffffff

4. Many-to-Many: Multiple original words ↔ multiple gateway words (nested combinations)

\zaln-s |x-strong="G3756" x-lemma="οὐ" x-morph="Gr,D,,,,,,,," x-occurrence="1" x-occurrences="1" x-content="οὐ"\*\zaln-s |x-strong="G3361" x-lemma="μή" x-morph="Gr,D,,,,,,,," x-occurrence="1" x-occurrences="1" x-content="μὴ"\*\w by|x-occurrence="1" x-occurrences="1"\w* \w no|x-occurrence="1" x-occurrences="1"\w* \w means|x-occurrence="1" x-occurrences="1"\w*\zaln-e\*\zaln-e\*

This shows a many-to-many relationship: the Greek double negative "οὐ μὴ" (literally "not not") is an emphatic negation construction that cannot be aligned as individual word pairs. You cannot say οὐ = "by" and μὴ = "no means" because neither Greek word individually means those English concepts. Instead, both Greek words together form an emphatic negation that requires the English idiomatic phrase "by no means" as an inseparable unit. The nested alignment markers show that both original words contribute to all three gateway words simultaneously.

graph LR
    subgraph Greek ["Greek Words (Nested Together)"]
        A["οὐ"]
        B["μὴ"]
    end
    subgraph English ["English Words (Nested Together)"]
        C["by"]
        D["no"]
        E["means"]
    end
    Greek --> English
    style Greek fill:#e1f5fe
    style English fill:#f3e5f5
    style A fill:#ffffff
    style B fill:#ffffff
    style C fill:#ffffff
    style D fill:#ffffff
    style E fill:#ffffff

Complex Alignment Example: Romans 1:1

Original Greek Text:

Παῦλος δοῦλος Χριστοῦ Ἰησοῦ

ULT/GLT Translation:

Paul, a servant of Christ Jesus

Alignment Visualization:

graph TD
    subgraph "Greek Original"
        subgraph "Personal Name"
            G1["Παῦλος<br/>(Paulos)<br/>G3972"]
        end
        subgraph "Role/Status"
            G2["δοῦλος<br/>(doulos)<br/>G1401"]
        end
        subgraph "Divine Name - Genitive Construction"
            G3["Χριστοῦ<br/>(Christou)<br/>G5547"]
            G4["Ἰησοῦ<br/>(Iēsou)<br/>G2424"]
        end
    end
    
    subgraph "English ULT/GLT"
        subgraph "Personal Name"
            E1["Paul"]
        end
        subgraph "Role Description"
            E2["a"]
            E3["servant"]
        end
        subgraph "Divine Name - Prepositional Phrase"
            E4["of"]
            E5["Christ"]
            E6["Jesus"]
        end
    end
    
    G1 --> E1
    G2 --> E2
    G2 --> E3
    G3 --> E4
    G3 --> E5
    G4 --> E6
    
    style G1 fill:#f9f,stroke:#333,stroke-width:2px
    style G2 fill:#f9f,stroke:#333,stroke-width:2px
    style G3 fill:#f9f,stroke:#333,stroke-width:2px
    style G4 fill:#f9f,stroke:#333,stroke-width:2px
    style E1 fill:#bbf,stroke:#333,stroke-width:2px
    style E2 fill:#bbf,stroke:#333,stroke-width:2px
    style E3 fill:#bbf,stroke:#333,stroke-width:2px
    style E4 fill:#bbf,stroke:#333,stroke-width:2px
    style E5 fill:#bbf,stroke:#333,stroke-width:2px
    style E6 fill:#bbf,stroke:#333,stroke-width:2px

USFM Alignment Code:

\v 1 \zaln-s |x-strong="G39720" x-lemma="Παῦλος" x-morph="Gr,N,,,,,NMS," x-occurrence="1" x-occurrences="1" x-content="Παῦλος"\*\w Paul|x-occurrence="1" x-occurrences="1"\w*\zaln-e\*, \zaln-s |x-strong="G14010" x-lemma="δοῦλος" x-morph="Gr,N,,,,,NMS," x-occurrence="1" x-occurrences="1" x-content="δοῦλος"\*\w a|x-occurrence="1" x-occurrences="1"\w* \w servant|x-occurrence="1" x-occurrences="1"\w*\zaln-e\* \zaln-s |x-strong="G55470" x-lemma="Χριστός" x-morph="Gr,N,,,,,GMS," x-occurrence="1" x-occurrences="1" x-content="Χριστοῦ"\*\w of|x-occurrence="1" x-occurrences="1"\w* \w Christ|x-occurrence="1" x-occurrences="1"\w*\zaln-e\* \zaln-s |x-strong="G24240" x-lemma="Ἰησοῦς" x-morph="Gr,N,,,,,GMS," x-occurrence="1" x-occurrences="1" x-content="Ἰησοῦ"\*\w Jesus|x-occurrence="1" x-occurrences="1"\w*\zaln-e\*

3. Many-to-One: Multiple original words ↔ single gateway word

\zaln-s |x-strong="G1223" x-content="διὰ"\*\zaln-s |x-strong="G5124" x-content="τοῦτο"\*\w therefore\w*\zaln-e\*\zaln-e\*

4. Many-to-Many: Multiple original words ↔ multiple gateway words

\zaln-s |x-strong="G2570" x-content="καλῶς"\*\zaln-s |x-strong="G4160" x-content="ποιεῖς"\*\w do\w* \w good\w*\zaln-e\*\zaln-e\*

Versification Layer

The versification layer defines the standardized chapter and verse numbering system used by a resource, ensuring precise cross-resource references. Different Bible traditions use different versification schemes (e.g., KJV, RSV, Vulgate), which can vary in verse counts and numbering. The versification system is declared in the resource container's manifest file, allowing tools to properly align references when working with resources that may use different versification schemes.

Versification is commonly used by navigation components to know which references are available for the current resource. They are usually described in json files. See: https://github.com/Copenhagen-Alliance/versification-specification

Quality Assurance of the Resource System

The resource system is a complex system that requires a lot of quality assurance to ensure that the resources are correct and consistent.

Validation Requirements

Content Integrity:

Format validation (USFM, TSV, Markdown parsing)
Unicode/encoding compliance (UTF-8)
Cross-resource reference resolution
Version compatibility checks
Manifest schema validation

Translation Workflow Validation:

Alignment coverage (every original word aligned)
Note accuracy (references valid aligned words)
Link validity (TWL points to existing words)
Question effectiveness (TQ verifies translation meaning)

Extensibility

Creating New Resources

Requirements:

Follow RC directory structure and manifest format
Use established file formats (USFM, TSV, Markdown)
Implement RC link compatibility
Define appropriate subject and identifier
Maintain cross-resource linking patterns

Example New Resource:

# Custom commentary resource
dublin_core:
  identifier: 'biblical-commentary'
  subject: 'Commentary'
  type: 'help'
  format: 'text/tab-separated-values'
relation:
  - en/ult
  - en/ust
projects:
  - title: Genesis
    versification: ufw
    identifier: gen
    sort: 1
    path: ./bc_GEN.tsv
    categories:
      - bible-ot

Memory Management

Selective Loading: Only load required books/chapters
Resource Cleanup: Unload unused resources after time threshold
Streaming Parsing: Parse large USFM files in chunks
Index Generation: Create searchable indexes for faster lookup

Development Workflow

Local Development Setup

Environment Prerequisites

Basic Requirements:

Git client for repository access
HTTP client library for API access
JSON/YAML parser for manifest processing
UTF-8 text processing capabilities
Archive extraction utilities (ZIP/TAR)

Development Tools:

USFM parser/validator library
TSV processing capabilities
Markdown renderer (for TA/TW content)
Regular expression engine (for alignment parsing)
Caching mechanism (file system or memory-based)

Resource Discovery & Loading

Initial Setup Workflow:

Environment Configuration: Set base API URL and authentication tokens
Language Selection: Choose target gateway language(s) for development
Resource Discovery: Use Catalog API to identify available resources
Dependency Resolution: Parse manifest files to understand resource relationships
Selective Loading: Download only required resources for development scope

Development Data Sources:

Recommended Development Resources:
- Primary: en_ult, en_ust (English gateway texts)
- Testing: Small book sets (e.g., Philippians, 1 John)
- Validation: Complete resource sets for integration testing
- Performance: Large books (e.g., Genesis, Matthew) for optimization

Testing Strategies

Unit Testing Approaches

Format Validation Testing:

Test Categories:
1. USFM Parser Tests
   - Valid markup parsing
   - Alignment marker extraction
   - Error handling for malformed content
   - Character encoding validation

2. TSV Structure Tests
   - Column header validation
   - Data type verification
   - Reference format checking
   - Markdown content parsing

3. Manifest Processing Tests
   - Schema validation
   - Dependency resolution
   - Version compatibility checking
   - Link format validation

Integration Testing Patterns:

Cross-Resource Testing:
1. Reference Resolution
   - RC link validation
   - Cross-resource navigation
   - Version compatibility
   - Missing resource handling

2. Alignment System Testing
   - Word-level mapping accuracy
   - Occurrence counting validation
   - Nested alignment parsing
   - Performance under load

3. Content Consistency Testing
   - Translation note accuracy
   - Word link validation
   - Question relevance checking
   - Academy article references

Test Data Management

Sample Data Sets:

Minimal Set: Single chapter for basic functionality testing
Standard Set: Complete book for integration testing
Comprehensive Set: Multi-book collection for system testing
Stress Test Set: Complete Bible for performance validation

Mock Data Strategies:

API Response Mocking: Simulate catalog and repository API responses
Content Stubbing: Generate minimal valid content for testing
Error Simulation: Create invalid content for error handling tests
Performance Testing: Use large datasets for optimization validation

Debugging Approaches

Common Issues & Solutions

Resource Loading Problems:

Debugging Checklist:
1. API Connectivity
   - Verify network access to git.door43.org
   - Check authentication token validity
   - Validate rate limiting status
   - Test fallback mechanisms

2. Content Parsing Issues
   - Validate UTF-8 encoding
   - Check format specification compliance
   - Verify alignment marker syntax
   - Test cross-reference resolution

3. Performance Problems
   - Profile memory usage during loading
   - Analyze network request patterns
   - Measure parsing performance
   - Optimize caching strategies

Alignment Processing Debugging:

Common Alignment Issues:
1. Nested Marker Problems
   - Validate opening/closing pairs
   - Check nesting depth limits
   - Verify occurrence counting
   - Test edge case handling

2. Reference Resolution Failures
   - Validate Strong's number format
   - Check original language text availability
   - Verify word occurrence accuracy
   - Test missing reference handling

3. Performance Bottlenecks
   - Profile alignment parsing speed
   - Optimize regular expression usage
   - Implement incremental processing
   - Cache parsed alignment data

Debugging Tools & Techniques

Logging Strategies:

Request Logging: Track all API calls with timing information
Parse Logging: Record content parsing steps and errors
Reference Logging: Log cross-resource reference resolution
Performance Logging: Track memory usage and processing times

Validation Tools:

Content Validators: Real-time format and content validation
Link Checkers: Automated cross-reference validation
Performance Profilers: Memory and CPU usage analysis
Error Aggregators: Centralized error collection and analysis

Performance Optimization

Resource Loading Optimization

Caching Strategies:

Multi-Layer Caching:
1. Network Layer: Cache API responses with appropriate TTL
2. Content Layer: Cache parsed resources in memory/disk
3. Index Layer: Cache search indexes and cross-references
4. Result Layer: Cache computed results and aggregations

Parallel Processing:

Concurrency Patterns:
1. Parallel Resource Loading: Download multiple resources simultaneously
2. Background Processing: Parse content while loading additional resources
3. Lazy Loading: Load resources on-demand rather than upfront
4. Progressive Enhancement: Start with minimal features, add complexity

Memory Management

Resource Lifecycle:

Memory Optimization:
1. Load Planning: Determine minimum required resource set
2. Selective Loading: Load only needed books/chapters
3. Resource Cleanup: Unload unused resources after timeout
4. Garbage Collection: Monitor and optimize memory usage patterns

Processing Optimization:

Performance Patterns:
1. Streaming Parsing: Process large files in chunks
2. Index Generation: Pre-compute search indexes
3. Reference Caching: Cache frequently accessed cross-references
4. Batch Operations: Group similar operations for efficiency

Key Success Factors

Understand Dependencies: Word alignment is the central hub connecting all resources
Implement RC Standards: Follow Resource Container specification precisely
Leverage Catalog API: Use enhanced discovery capabilities when available
Design for Multi-Language: Support any gateway language implementation
Optimize Loading: Use parallel loading, caching, and dependency filtering
Handle Errors Gracefully: Implement robust error handling and fallbacks
Plan for Extensions: Design systems to accommodate new resource types
Cache Intelligently: Implement efficient caching for network and parsing operations

Automated Research & AI Analysis

This section provides guidance for AI systems, automated tools, and research platforms to systematically explore and analyze the unfoldingWord ecosystem for deeper insights and understanding.

Research Target Priorities

Primary Research Targets:

Resource	Research Focus	Key Insights to Extract
USFM 3.1 Specification	Format standards and syntax rules	Complete marker catalog, validation rules, alignment syntax extensions
Resource Container Specification	Structural standards and metadata	Dublin Core implementation, manifest schema, linking patterns
Gateway Language Manual	Translation methodology and principles	Quality standards, community workflows, theological frameworks
Door43 API Documentation	Technical implementation details	Complete endpoint catalog, response schemas, error handling patterns

Systematic Research Strategies

USFM Specification Deep Analysis

Research Objectives:

Technical Analysis:
1. Complete USFM marker inventory and classification
2. Alignment syntax variations and edge cases
3. Cross-reference syntax patterns and validation rules
4. Character encoding requirements and Unicode handling
5. Version compatibility matrices and migration patterns

Extraction Methodologies:

Automated Extraction:
1. Parse specification for all \marker definitions
2. Identify syntax patterns using regex analysis
3. Extract validation rules and constraints
4. Build comprehensive marker relationship graphs
5. Generate test case matrices for implementation validation

Resource Container Specification Research

Research Objectives:

Structural Analysis:
1. Complete Dublin Core metadata field catalog
2. Container type variations and file organization patterns
3. Cross-resource linking syntax and URI patterns
4. Version management and compatibility strategies
5. Quality assurance frameworks and validation approaches

Implementation Discovery:

Pattern Recognition:
1. Analyze manifest.yaml schemas across multiple examples
2. Extract resource relationship patterns and dependency graphs
3. Identify naming conventions and organizational structures
4. Map quality levels to implementation requirements
5. Document extension mechanisms for new resource types

Gateway Language Manual Analysis

Research Objectives:

Methodological Insights:
1. Translation quality frameworks and assessment criteria
2. Community collaboration patterns and governance models
3. Cultural adaptation strategies and localization principles
4. Theological review processes and accuracy standards
5. Technology integration approaches and tool recommendations

Workflow Documentation:

Process Mapping:
1. Extract step-by-step translation workflows
2. Identify quality checkpoints and review cycles
3. Document community roles and responsibility matrices
4. Analyze feedback incorporation and revision patterns
5. Map training requirements and skill development paths

API Documentation Comprehensive Analysis

Research Objectives:

Technical Implementation:
1. Complete endpoint catalog with request/response schemas
2. Authentication mechanisms and security implementations
3. Rate limiting strategies and performance optimization
4. Error handling patterns and recovery procedures
5. API versioning strategies and backward compatibility

Integration Patterns:

Usage Analysis:
1. Extract common API usage patterns and best practices
2. Identify performance bottlenecks and optimization strategies
3. Document error scenarios and resolution approaches
4. Analyze caching strategies and data management patterns
5. Map API evolution patterns and deprecation handling

Cross-Resource Analysis Frameworks

Ecosystem Coherence Verification

Consistency Analysis:

Cross-Reference Validation:
1. Verify alignment between specification documents
2. Identify gaps or conflicts between different standards
3. Analyze implementation consistency across resources
4. Validate cross-resource linking and dependency patterns
5. Document evolution patterns and version synchronization

Implementation Gap Identification

Gap Analysis Framework:

Coverage Assessment:
1. Map specification coverage against real-world implementations
2. Identify undocumented patterns in actual resource usage
3. Analyze community-developed extensions and variations
4. Document implementation-specific optimizations and adaptations
5. Track specification evolution based on practical needs

Automated Knowledge Extraction

Natural Language Processing Applications

Content Analysis:

NLP Research Applications:
1. Extract technical terminology and concept definitions
2. Build comprehensive glossaries and relationship maps
3. Identify procedural knowledge and workflow patterns
4. Analyze community feedback and improvement suggestions
5. Generate implementation guidelines from specification text

Machine Learning Pattern Recognition

Pattern Discovery:

ML Analysis Opportunities:
1. Resource usage pattern analysis across communities
2. Quality prediction models based on metadata patterns
3. Dependency optimization through usage analytics
4. Translation quality assessment through alignment analysis
5. Community collaboration effectiveness measurement

Research Output Integration

Knowledge Graph Construction

Semantic Relationships:

Graph Building:
1. Map all entities (resources, specifications, communities)
2. Define relationship types (depends_on, implements, references)
3. Incorporate temporal dimensions for evolution tracking
4. Build inference capabilities for gap identification
5. Enable query interfaces for developer assistance

Continuous Monitoring

Change Detection:

Evolution Tracking:
1. Monitor specification updates and version changes
2. Track API evolution and endpoint modifications
3. Analyze community feedback and issue resolution patterns
4. Identify emerging usage patterns and requirements
5. Generate alerts for breaking changes or compatibility issues

AI Research Best Practices

Ethical Considerations

Research Guidelines:

Responsible Research:
1. Respect API rate limits and server resources
2. Attribute insights to original specification sources
3. Maintain accuracy in extracted information representation
4. Consider community impact of automated analysis
5. Contribute findings back to community knowledge base

Quality Assurance

Validation Strategies:

Research Validation:
1. Cross-verify extracted information with multiple sources
2. Validate technical insights through implementation testing
3. Confirm community practices through direct observation
4. Peer review automated analysis results
5. Maintain provenance chains for all extracted knowledge

Glossary

Gateway Languages

Strategic languages (such as English, Spanish, French, Portuguese, Hindi) that serve as intermediary bridges between the original biblical languages (Hebrew, Greek, Aramaic) and target heart languages. Gateway languages are major languages that Mother Tongue Translators can understand and use as a foundation for translating Scripture into their native languages.

Mother Tongue Translators (MTTs)

Translators who are native speakers of the target language and are working to translate Scripture into their own heart language - the language they know best and speak most naturally.

Heart Languages

The native languages that Mother Tongue Translators speak most naturally and are translating Scripture into. These are the target languages for the final Bible translations that will serve local communities.

Getting Started

Developer Guides

Repository Formats

Migration & Conversion

Automation & MCP

unfoldingWord Developer Guide

unfoldingWord Bible Translation Resources Ecosystem: Developer Guide

Introduction

Table of Contents

TLDR

Overview

Mission & Core Concepts

Resource Ecosystem Architecture

Key Design Principles

Resource Types & Functions

Core Architecture

Technical Specifications

Resource Container (RC) Specification

Purpose & Structure

Example: English ULT Resource Container

Basic Requirements for Creating a Resource Container

Core RC Principles

How RC Provides Structure and Interconnection

RC Container Types & File Patterns

Manifest Structure (Key Fields)

RC Linking System

Hosting Resource Containers

API Reference

Platform Overview

Catalog API Endpoints

Repository API Endpoints

Authentication & Access Control

Rate Limiting & Performance

Repository Organization

Organizational Models

Repository Naming Conventions

Repository Structure

Discovery Patterns

Access Patterns & Performance

Repository Relationships

Quality Control & Governance

Migration & Reorganization

Best Practices

Integration Strategy

Loading Patterns

Dependency Resolution

Resource formats

USFM

1

The creation

TSV

Markdown

unfoldingWord Resource Containers and their relationships

1. UHB (Hebrew Bible)

2. UGNT (Greek New Testament)

3. ULT (Literal Translation for Translators)

4. UST (Simplified Translation for Translators)

5. Translation Notes

6. Translation Academy

7. Translation Words

8. Translation Words Links

9. Translation Questions

Alignment Layer

Alignment Syntax

USFM Alignment Syntax Structure

Alignment Relationship Types

Complex Alignment Example: Romans 1:1

Versification Layer

Quality Assurance of the Resource System

Validation Requirements

Extensibility

Creating New Resources

Memory Management

Development Workflow

Local Development Setup

Environment Prerequisites

Resource Discovery & Loading

Testing Strategies

Unit Testing Approaches

Test Data Management