Repository Formats

Migration & Conversion

Bible Text Repositories Guide

Introduction

This guide covers how to handle Bible text repositories in Door43, including both original language texts and gateway language translations. These repositories contain complete Bibles in USFM format with different levels of alignment and complexity.

Repository Types Covered:

  • Original Language Texts: Hebrew Bible (UHB), Greek New Testament (UGNT)
  • Gateway Language Translations: Literal Translation (ULT), Simplified Translation (UST)

Repository Types

Original Language Bible Repositories

Examples: hbo_uhb, el-x-koine_ugnt

Key Characteristics:

  • Subject: "Bible"
  • Container type: "bundle"
  • Content: Original Hebrew/Greek text
  • Format: USFM 3.0
  • Tokenization: Words marked for alignment
  • Scope: Complete Bible (66 books)

Gateway Language Bible Repositories

Examples: en_ult, en_ust

Key Characteristics:

  • Subject: "Aligned Bible"
  • Container type: "bundle"
  • Content: Gateway language translation
  • Format: USFM 3.0 with word alignment
  • Alignment: Word-level connections to original languages
  • Scope: Complete Bible (66 books)

How to Identify Bible Text Repositories

Step 1: Check the Manifest Subject

  • Look for dublin_core.subject field in manifest.yaml
  • Original texts have subject "Bible"
  • Gateway translations have subject "Aligned Bible"

Step 2: Verify Container Type

  • Check dublin_core.type field
  • Should be "bundle" for Bible text repositories

Step 3: Confirm File Structure

  • Look for numbered USFM files (01-GEN.usfm, 02-EXO.usfm, etc.)
  • Should have files for most or all 66 Bible books
  • Files follow the pattern: {NN}-{BOOK}.usfm

Manifest Structure for Bible Texts

Original Language Bible Manifest

dublin_core:
  identifier: 'uhb'                    # Resource identifier
  language:
    identifier: 'hbo'                  # Biblical Hebrew language code
    direction: 'rtl'                   # Right-to-left text direction
  subject: 'Bible'                     # Original language text
  type: 'bundle'                       # Complete collection
  version: '2.1.30'                    # Resource version
  format: 'text/usfm3'                # USFM 3.0 format

projects:                              # All 66 books
  - identifier: 'gen'                  # Book identifier
    title: 'Genesis'                   # Human-readable title
    path: './01-GEN.usfm'             # File path
    sort: 1                           # Display order
    versification: 'original'         # Versification system
    categories: ['bible-ot']          # Old Testament

Gateway Language Bible Manifest

dublin_core:
  identifier: 'ult'                    # Resource identifier
  language:
    identifier: 'en'                   # English language code
    direction: 'ltr'                   # Left-to-right text direction
  subject: 'Aligned Bible'             # Gateway language with alignment
  type: 'bundle'                       # Complete collection
  version: '86'                        # Resource version
  format: 'text/usfm3'                # USFM 3.0 format
  relation:                           # Dependencies
    - 'en/tw'                         # Translation Words
    - 'en/tn'                         # Translation Notes
    - 'hbo/uhb?v=2.1.30'             # Hebrew Bible source
    - 'el-x-koine/ugnt?v=0.34'       # Greek NT source

projects:                             # All 66 books
  - identifier: 'gen'                 # Book identifier
    title: 'Genesis'                  # Human-readable title
    path: './01-GEN.usfm'            # File path
    sort: 1                          # Display order
    versification: 'ufw'             # Versification system
    categories: ['bible-ot']         # Old Testament

File Structure Patterns

Original Language Repository Structure

hbo_uhb/
โ”œโ”€โ”€ ๐Ÿ“„ manifest.yaml                 # Resource Container manifest
โ”œโ”€โ”€ ๐Ÿ“„ LICENSE.md                    # CC BY-SA 4.0 license
โ”œโ”€โ”€ ๐Ÿ“„ 01-GEN.usfm                   # Genesis
โ”œโ”€โ”€ ๐Ÿ“„ 02-EXO.usfm                   # Exodus
โ”œโ”€โ”€ ๐Ÿ“„ 03-LEV.usfm                   # Leviticus
โ”œโ”€โ”€ ...                              # All Old Testament books
โ”œโ”€โ”€ ๐Ÿ“„ 39-MAL.usfm                   # Malachi (last OT book)
โ””โ”€โ”€ ๐Ÿ“„ README.md                     # Repository documentation

Gateway Language Repository Structure

en_ult/
โ”œโ”€โ”€ ๐Ÿ“„ manifest.yaml                 # Resource Container manifest
โ”œโ”€โ”€ ๐Ÿ“„ LICENSE.md                    # CC BY-SA 4.0 license
โ”œโ”€โ”€ ๐Ÿ“„ A0-FRT.usfm                   # Front matter
โ”œโ”€โ”€ ๐Ÿ“„ 01-GEN.usfm                   # Genesis with alignment
โ”œโ”€โ”€ ๐Ÿ“„ 02-EXO.usfm                   # Exodus with alignment
โ”œโ”€โ”€ ...                              # All 66 books
โ”œโ”€โ”€ ๐Ÿ“„ 40-MAT.usfm                   # Matthew (first NT book)
โ”œโ”€โ”€ ...                              # All New Testament books
โ””โ”€โ”€ ๐Ÿ“„ 67-REV.usfm                   # Revelation

Content Analysis

Original Language USFM Content

Hebrew Bible Sample (01-GEN.usfm):

\id GEN unfoldingWordยฎ Hebrew Bible
\usfm 3.0
\ide UTF-8
\h ื‘ืจืืฉื™ืช
\toc1 ื‘ืจืืฉื™ืช
\toc2 ื‘ืจืืฉื™ืช
\toc3 ื‘ืจ
\mt ื‘ืจืืฉื™ืช

\c 1
\p
\v 1 ื‘ึฐึผืจึตืืฉึดืึ–ื™ืช ื‘ึธึผืจึธึฃื ืึฑืœึนื”ึดึ‘ื™ื ืึตึฅืช ื”ึทืฉึธึผืืžึทึ–ื™ึดื ื•ึฐืึตึฅืช ื”ึธืึธึฝืจึถืฅืƒ

Key Features:

  • Hebrew text in right-to-left direction
  • Standard USFM markers
  • No alignment data (this is the source)
  • Tokenized for alignment purposes

Gateway Language USFM Content

English ULT Sample (01-GEN.usfm):

\id GEN unfoldingWordยฎ Literal Text
\usfm 3.0
\ide UTF-8
\h Genesis
\toc1 The Book of Genesis
\toc2 Genesis
\toc3 Gen
\mt Genesis

\c 1
\p
\v 1 \zaln-s |x-strong="H07225" x-lemma="ืจึตืืฉึดืื™ืช" x-content="ื‘ึฐึผืจึตืืฉึดืื™ืช"\*\w In|x-occurrence="1"\w* \w the|x-occurrence="1"\w* \w beginning|x-occurrence="1"\w*\zaln-e\* \zaln-s |x-strong="H0430" x-lemma="ืึฑืœึนื”ึดื™ื" x-content="ืึฑืœึนื”ึดื™ื"\*\w God|x-occurrence="1"\w*\zaln-e\* \zaln-s |x-strong="H01254" x-lemma="ื‘ึธึผืจึธื" x-content="ื‘ึธึผืจึธื"\*\w created|x-occurrence="1"\w*\zaln-e\* \zaln-s |x-strong="H0853" x-lemma="ืึตืช" x-content="ืึตืช"\*\zaln-e\* \zaln-s |x-strong="H08064" x-lemma="ืฉึธืืžึทื™ึดื" x-content="ื”ึทืฉึธึผืืžึทึ–ื™ึดื"\*\w the|x-occurrence="2"\w* \w heavens|x-occurrence="1"\w*\zaln-e\* \zaln-s |x-strong="H0853" x-lemma="ืึตืช" x-content="ื•ึฐืึตืช"\*\w and|x-occurrence="1"\w*\zaln-e\* \zaln-s |x-strong="H0776" x-lemma="ืึถืจึถืฅ" x-content="ื”ึธืึธึฝืจึถืฅ"\*\w the|x-occurrence="3"\w* \w earth|x-occurrence="1"\w*\zaln-e\*.

Key Features:

  • English translation text
  • Extensive word alignment markers (\zaln-s, \zaln-e, \w)
  • Strong's concordance numbers
  • Hebrew lemma and morphology data
  • Occurrence tracking for precise alignment

How to Process Bible Text Repositories

Step 1: Identify Repository Type

Check Repository Characteristics:

  • Verify the manifest subject is "Bible" or "Aligned Bible"
  • Confirm container type is "bundle"
  • Look for numbered USFM files covering multiple books

Determine Alignment Level:

  • Original language texts: No alignment data
  • Gateway language texts: Extensive alignment markers

Step 2: Extract Bible Structure Information

From Manifest:

  • Get the complete book list from projects[] array
  • Note the versification system used
  • Check for book categories (bible-ot, bible-nt, bible-frt)

Expected Book Count:

  • Complete Bible: 66+ books (including front matter)
  • Old Testament only: 39 books
  • New Testament only: 27 books

Step 3: Process File Organization

File Naming Pattern:

  • Books numbered by canonical order: 01-GEN.usfm, 02-EXO.usfm
  • New Testament starts at 40: 40-MAT.usfm (note: some use 41-MAT.usfm)
  • Front matter: A0-FRT.usfm (if present)

File Size Expectations:

  • Small books (Philemon, 2-3 John): 5-15 KB
  • Medium books (Ephesians, Philippians): 20-50 KB
  • Large books (Genesis, Psalms): 100-500 KB
  • Very large books (1 Chronicles): 500+ KB

Step 4: Handle Content Access

For Original Language Texts:

  • Use raw URLs for direct access to Hebrew/Greek text
  • Content is ready for parsing with standard USFM parser
  • No alignment processing needed

For Gateway Language Texts:

  • Use raw URLs for content access
  • Content requires alignment-aware USFM parser
  • Process alignment markers for word-level features

Step 5: Process Dependencies

Original Language Texts:

  • Usually have minimal dependencies
  • May reference gateway language translations

Gateway Language Texts:

  • Always reference original language sources
  • Reference support resources (TN, TW, TA, TQ)
  • May reference parallel gateway translations

Application Integration

How to Display Bible Text Repositories in Preview Apps

Step 1: Present Repository Information

  • Show the Bible name and language clearly
  • Indicate if it's original language or gateway language
  • Display the translation approach (literal vs simplified for gateway languages)

Step 2: Organize Book Navigation

  • Group books by testament (Old Testament, New Testament)
  • Show book names in both identifier and full title
  • Include book categories if available

Step 3: Handle Alignment Features

  • For gateway language texts, indicate that word alignment is available
  • Show related resources that work with this Bible
  • Provide access to alignment-dependent features

Step 4: Show Content Statistics

  • Display total number of books available
  • Show versification system used
  • Indicate checking level if available

How to Use Bible Text Repositories in Editing Apps

Step 1: Set Up Bible Access

  • Configure access to all book files based on projects array
  • Set up navigation between books and chapters
  • Handle versification system appropriately

Step 2: Configure Alignment Processing (for gateway languages)

  • Set up alignment marker parsing
  • Enable word-level highlighting features
  • Connect to original language sources for alignment data

Step 3: Enable Related Resources

  • Configure access to Translation Notes for verse guidance
  • Set up Translation Words for term definitions
  • Enable Translation Questions for quality checking

Step 4: Handle Large File Sizes

  • Implement efficient loading for large books
  • Consider chapter-by-chapter loading for very large books
  • Cache frequently accessed books locally

Differences Between Bible Repository Types

AspectOriginal LanguageGateway Language
Subject"Bible""Aligned Bible"
AlignmentNoneExtensive word alignment
DependenciesMinimalMany (TN, TW, TA, etc.)
ComplexityStandard USFMUSFM + alignment markers
File SizeStandardLarger due to alignment
Use CaseSource referenceTranslation base
Target UsersAdvanced translatorsAll translators

Best Practices

1. File Access Strategy

  • Use raw URLs for better performance with large files
  • Cache frequently accessed books locally
  • Handle file size variations appropriately

2. Alignment Processing

  • For gateway languages, always process alignment markers
  • Maintain connection to original language sources
  • Enable word-level features only for aligned texts

3. Navigation and Display

  • Provide clear testament and book organization
  • Show translation approach (literal vs simplified)
  • Indicate alignment availability to users

4. Performance Optimization

  • Load books on demand rather than all at once
  • Cache manifest data for quick book enumeration
  • Use efficient USFM parsing for large files

Common Issues and Solutions

Issue 1: Large File Sizes

Problem: Some books (like Psalms) can be very large with alignment data Solution: Implement progressive loading or chapter-by-chapter access

Issue 2: Alignment Marker Complexity

Problem: Gateway language texts have complex alignment syntax Solution: Use specialized USFM parsers that handle alignment markers

Issue 3: Versification Differences

Problem: Different Bible traditions use different verse numbering Solution: Always check the versification field and handle appropriately

This guide is based on analysis of Door43 Bible text repositories and should be used alongside the main Door43 API Developer Guide.