Repository Formats

Migration & Conversion

MCP Instructions: Door43 Repository Analyzer

Overview

This document provides comprehensive instructions for an MCP (Model Context Protocol) system to analyze Door43 repositories using the Door43 API endpoints, extract repository structure and content information, and automatically update or create repository format guides.

MCP Capabilities Required

Core Functions Needed

  1. HTTP Request Capability: Make API calls to Door43 endpoints
  2. File Processing: Read, parse, and analyze various file formats (YAML, JSON, USFM, TSV, Markdown)
  3. Content Analysis: Extract patterns, structures, and metadata from repository content
  4. Documentation Generation: Create and update markdown guides based on analysis
  5. Cross-Repository Comparison: Identify patterns across multiple repositories

Data Processing Requirements

  • YAML Parser: For Resource Container manifest.yaml files
  • JSON Parser: For Scripture Burrito metadata.json and tool manifests
  • Text Analysis: For USFM, TSV, and Markdown content structure analysis
  • Pattern Recognition: Identify file naming patterns, directory structures, content types

Repository Analysis Framework

Phase 1: Repository Discovery and Classification

Step 1: Discover Target Repositories

Use Catalog API for Published Resources:

GET https://git.door43.org/api/v1/catalog/search?stage=prod

Use Gitea API for All Repositories:

GET https://git.door43.org/api/v1/repos?org=unfoldingWord
GET https://git.door43.org/api/v1/repos?org=BurritoTruck

Filter by Repository Patterns:

  • Resource Containers: {lang}_{resource} pattern
  • Tool-generated: Look for specific naming patterns
  • Scripture Burritos: Check organizations like BurritoTruck

Step 2: Initial Repository Classification

For Each Repository:

  1. Get Repository Metadata:

    GET /api/v1/repos/{owner}/{repo}
    
  2. Get File Listing:

    GET /api/v1/repos/{owner}/{repo}/contents
    
  3. Identify Specification Type:

    • Look for manifest.yaml, metadata.json, manifest.json
    • Download and parse specification files
    • Classify as RC, Scripture Burrito, translationCore, translationStudio, or Unknown

Phase 2: Deep Repository Analysis

Step 3: Specification Analysis

For Resource Container Repositories:

  1. Download Manifest:

    GET /api/v1/repos/{owner}/{repo}/contents/manifest.yaml
    
  2. Extract Key Information:

    • dublin_core.subject - Resource type
    • dublin_core.type - Container type
    • dublin_core.identifier - Resource identifier
    • projects[] - File structure
    • relation[] - Dependencies
  3. Analyze Projects Structure:

    • File naming patterns
    • Content types (USFM, TSV, Markdown)
    • Organization principles
    • Scope and coverage

For Scripture Burrito Repositories:

  1. Download Metadata:

    GET /api/v1/repos/{owner}/{repo}/contents/metadata.json
    
  2. Extract Key Information:

    • meta.format and meta.version
    • type.flavorType - Resource category
    • ingredients{} - File organization
    • relationships[] - Dependencies

For Tool-Generated Repositories:

  1. Download Manifest:

    GET /api/v1/repos/{owner}/{repo}/contents/manifest.json
    
  2. Detect Tool Type:

    • tc_version or generator.name for translationCore
    • package_version and generator.name for translationStudio

Step 4: Content Analysis

Sample Content Files:

  1. Download Sample Files (first few of each type):

    GET /api/v1/repos/{owner}/{repo}/contents/{file_path}
    
  2. Analyze Content Structure:

    • USFM Files: Count chapters, verses, check for alignment markers
    • TSV Files: Analyze headers, row structure, content type
    • Markdown Files: Examine structure, headers, content organization
  3. Extract Content Patterns:

    • File size distributions
    • Content complexity levels
    • Cross-reference patterns
    • Structural variations

Phase 3: Pattern Recognition and Guide Generation

Step 5: Cross-Repository Pattern Analysis

Group Repositories by Type:

  • Resource Container by subject
  • Scripture Burrito by flavor type
  • Tool-generated by tool and version

Identify Common Patterns:

  • File naming conventions
  • Directory structures
  • Content organization principles
  • Cross-reference systems
  • Metadata patterns

Detect Variations:

  • Non-standard implementations
  • Version differences
  • Tool-specific variations
  • Custom extensions

Step 6: Guide Creation and Updates

Determine Guide Needs:

  • New repository types requiring new guides
  • Existing guides needing updates
  • Missing information in current guides
  • Outdated information requiring correction

Generate Guide Content:

  • Use repository analysis data to populate guide templates
  • Include real examples from analyzed repositories
  • Document discovered patterns and variations
  • Provide specific processing instructions

MCP Analysis Instructions

Repository Analysis Workflow

Instruction Set 1: Basic Repository Analysis

TASK: Analyze Door43 repository structure and classification

INPUTS:
- Repository owner and name
- Door43 API base URL: https://git.door43.org/api/v1

PROCESS:
1. GET repository metadata from /repos/{owner}/{repo}
2. GET repository contents from /repos/{owner}/{repo}/contents
3. IDENTIFY specification files (manifest.yaml, metadata.json, manifest.json)
4. DOWNLOAD and PARSE specification files
5. CLASSIFY repository type based on specification content
6. EXTRACT key metadata fields
7. ANALYZE file structure and organization
8. SAMPLE content files for structure analysis

OUTPUT:
- Repository classification (RC, SB, tC, tS, Unknown)
- Specification version and key fields
- File structure analysis
- Content type distribution
- Processing recommendations

Instruction Set 2: Content Structure Analysis

TASK: Analyze repository content structure and patterns

INPUTS:
- Repository classification from previous analysis
- File listing from repository contents
- Sample content files

PROCESS:
1. CATEGORIZE files by extension and naming patterns
2. DOWNLOAD sample files of each major type
3. ANALYZE content structure:
   - USFM: Extract markers, count verses/chapters, check alignment
   - TSV: Analyze headers, detect content type, count rows
   - Markdown: Examine structure, headers, content organization
   - JSON: Parse structure, identify purpose, extract key fields
4. IDENTIFY cross-reference patterns
5. DETECT content complexity levels
6. EXTRACT content characteristics

OUTPUT:
- Content type analysis
- File organization patterns
- Cross-reference systems
- Content complexity assessment
- Processing requirements

Instruction Set 3: Guide Generation and Updates

TASK: Generate or update repository format guides

INPUTS:
- Repository analysis results
- Existing guide content (if any)
- Guide templates
- Cross-repository pattern analysis

PROCESS:
1. DETERMINE guide requirements:
   - New repository type needing new guide
   - Existing guide needing updates
   - Missing information requiring addition
2. SELECT appropriate guide template
3. POPULATE template with analysis data:
   - Repository characteristics
   - Manifest/metadata structure
   - File organization patterns
   - Content examples
   - Processing instructions
4. GENERATE natural language instructions
5. INCLUDE real examples from analyzed repositories
6. VALIDATE guide completeness and accuracy

OUTPUT:
- Updated or new repository format guide
- Migration guide (if RC to SB conversion applicable)
- Documentation of changes made
- Recommendations for further analysis

Specific Analysis Tasks

Task 1: Resource Container Analysis

Repository Targets:

  • unfoldingWord organization repositories
  • Gateway language organization repositories (es-419_gl, fr_gl, etc.)
  • Individual user RC repositories

Analysis Focus:

  • Dublin Core metadata completeness
  • Projects array structure and patterns
  • File naming conventions
  • Content type distribution
  • Dependency relationships
  • Checking information patterns

Guide Update Triggers:

  • New RC subject types discovered
  • Changes in manifest structure
  • New file organization patterns
  • Updated cross-reference systems

Task 2: Scripture Burrito Analysis

Repository Targets:

  • BurritoTruck organization repositories
  • Other organizations using SB format
  • Individual SB repositories

Analysis Focus:

  • Flavor type usage patterns
  • Ingredient organization strategies
  • Scope definition patterns
  • Relationship modeling approaches
  • MIME type usage
  • Custom extension patterns

Guide Update Triggers:

  • New flavor types in use
  • Novel ingredient organization patterns
  • Enhanced relationship modeling
  • New MIME type usage

Task 3: Tool-Generated Repository Analysis

Repository Targets:

  • translationCore generated repositories
  • translationStudio generated repositories
  • Other tool-generated repositories

Analysis Focus:

  • Tool version evolution
  • Manifest structure changes
  • File organization variations
  • Content structure patterns
  • Tool-specific metadata

Guide Update Triggers:

  • New tool versions with structure changes
  • New tools generating repositories
  • Changes in tool-specific metadata
  • Evolution in file organization

API Endpoint Usage Strategy

Rate Limiting Management

Anonymous Access (60 requests/hour):

  • Use for initial discovery and testing
  • Suitable for small-scale analysis

Authenticated Access (1000+ requests/hour):

  • Required for comprehensive analysis
  • Use for production guide generation
  • Enables private repository access

Efficient API Usage Patterns

Batch Discovery:

  1. Use catalog endpoints for published resource discovery
  2. Use organization endpoints for complete repository listings
  3. Cache repository metadata for multiple analysis passes

Content Sampling Strategy:

  1. Download specification files for all repositories first
  2. Sample representative content files (first few of each type)
  3. Use raw URLs for large content files to avoid API overhead

Caching Strategy:

  1. Cache repository metadata (changes infrequently)
  2. Cache specification files (stable between versions)
  3. Refresh cache based on repository update timestamps

Guide Generation Templates

Repository Format Guide Template

Guide Structure:

  1. Introduction - Repository type and characteristics
  2. Identification - How to detect this repository type
  3. Manifest/Metadata Structure - Expected format and fields
  4. File Organization - Directory and file patterns
  5. Content Analysis - Sample content and structure
  6. Processing Instructions - Step-by-step handling
  7. Application Integration - Preview and editing guidance
  8. Best Practices - Optimization recommendations
  9. Common Issues - Problems and solutions

Content Population:

  • Use real repository examples from analysis
  • Include actual manifest/metadata snippets
  • Show real file structures and content samples
  • Document discovered patterns and variations

Migration Guide Template

Guide Structure:

  1. Introduction - Migration scope and benefits
  2. Migration Overview - Complexity and requirements
  3. Pre-Migration Analysis - Source repository assessment
  4. Migration Steps - Step-by-step conversion process
  5. Migration Examples - Real conversion examples
  6. Validation - Quality assurance steps
  7. Best Practices - Migration optimization
  8. Common Issues - Migration problems and solutions

Content Population:

  • Use actual RC and SB examples
  • Include real field mappings
  • Show before/after conversion examples
  • Document migration benefits and limitations

Quality Assurance Instructions

Guide Validation Requirements

Content Accuracy:

  • Verify all examples use real repository data
  • Confirm API endpoints and responses are current
  • Validate file structure examples match reality
  • Check that processing instructions are complete

Completeness Checks:

  • Ensure all major repository types are covered
  • Verify migration paths exist for all RC types
  • Confirm guides address common use cases
  • Check that cross-references between guides are accurate

Natural Language Validation:

  • Ensure no programming language specifics
  • Verify step-by-step instructions are clear
  • Confirm examples are implementation-agnostic
  • Check that technical concepts are explained clearly

Continuous Improvement Process

Regular Analysis Schedule:

  • Monthly: Check for new repository types
  • Quarterly: Validate existing guide accuracy
  • Semi-annually: Comprehensive cross-repository analysis
  • Annually: Major guide structure review

Update Triggers:

  • New repository types discovered
  • Changes in API endpoints or responses
  • Evolution in repository structures
  • User feedback and issues reported

Implementation Recommendations

MCP System Architecture

Core Components:

  1. API Client: Handle Door43 API interactions with rate limiting
  2. Repository Analyzer: Parse and analyze repository structures
  3. Pattern Detector: Identify common patterns across repositories
  4. Guide Generator: Create and update guides based on analysis
  5. Validator: Ensure guide quality and accuracy

Data Flow:

Repository Discovery → Specification Analysis → Content Sampling → 
Pattern Recognition → Guide Generation → Validation → Publication

Error Handling and Resilience

API Error Handling:

  • Handle rate limiting with exponential backoff
  • Manage network timeouts and retries
  • Gracefully handle missing or inaccessible repositories
  • Log errors for later review and resolution

Content Processing Errors:

  • Handle malformed specification files
  • Manage encoding issues in content files
  • Deal with unexpected file structures
  • Process repositories with missing standard files

Guide Generation Errors:

  • Validate generated content before publication
  • Handle template population failures
  • Manage cross-reference validation errors
  • Ensure guide completeness despite partial data

These MCP instructions enable automated analysis and guide generation for the Door43 ecosystem, ensuring documentation stays current and comprehensive as the repository landscape evolves.