MCP Instructions: Door43 Repository Analyzer

Overview

This document provides comprehensive instructions for an MCP (Model Context Protocol) system to analyze Door43 repositories using the Door43 API endpoints, extract repository structure and content information, and automatically update or create repository format guides.

MCP Capabilities Required

Core Functions Needed

HTTP Request Capability: Make API calls to Door43 endpoints
File Processing: Read, parse, and analyze various file formats (YAML, JSON, USFM, TSV, Markdown)
Content Analysis: Extract patterns, structures, and metadata from repository content
Documentation Generation: Create and update markdown guides based on analysis
Cross-Repository Comparison: Identify patterns across multiple repositories

Data Processing Requirements

YAML Parser: For Resource Container manifest.yaml files
JSON Parser: For Scripture Burrito metadata.json and tool manifests
Text Analysis: For USFM, TSV, and Markdown content structure analysis
Pattern Recognition: Identify file naming patterns, directory structures, content types

Repository Analysis Framework

Phase 1: Repository Discovery and Classification

Step 1: Discover Target Repositories

Use Catalog API for Published Resources:

GET https://git.door43.org/api/v1/catalog/search?stage=prod

Use Gitea API for All Repositories:

GET https://git.door43.org/api/v1/repos?org=unfoldingWord
GET https://git.door43.org/api/v1/repos?org=BurritoTruck

Filter by Repository Patterns:

Resource Containers: {lang}_{resource} pattern
Tool-generated: Look for specific naming patterns
Scripture Burritos: Check organizations like BurritoTruck

Step 2: Initial Repository Classification

For Each Repository:

Get Repository Metadata:
```
GET /api/v1/repos/{owner}/{repo}
```

Get File Listing:

GET /api/v1/repos/{owner}/{repo}/contents

Identify Specification Type:
- Look for manifest.yaml, metadata.json, manifest.json
- Download and parse specification files
- Classify as RC, Scripture Burrito, translationCore, translationStudio, or Unknown

Phase 2: Deep Repository Analysis

Step 3: Specification Analysis

For Resource Container Repositories:

Download Manifest:

GET /api/v1/repos/{owner}/{repo}/contents/manifest.yaml

Extract Key Information:
- dublin_core.subject - Resource type
- dublin_core.type - Container type
- dublin_core.identifier - Resource identifier
- projects[] - File structure
- relation[] - Dependencies
Analyze Projects Structure:
- File naming patterns
- Content types (USFM, TSV, Markdown)
- Organization principles
- Scope and coverage

For Scripture Burrito Repositories:

Download Metadata:

GET /api/v1/repos/{owner}/{repo}/contents/metadata.json

Extract Key Information:
- meta.format and meta.version
- type.flavorType - Resource category
- ingredients{} - File organization
- relationships[] - Dependencies

For Tool-Generated Repositories:

Download Manifest:

GET /api/v1/repos/{owner}/{repo}/contents/manifest.json

Detect Tool Type:
- tc_version or generator.name for translationCore
- package_version and generator.name for translationStudio

Step 4: Content Analysis

Sample Content Files:

Download Sample Files (first few of each type):

GET /api/v1/repos/{owner}/{repo}/contents/{file_path}

Analyze Content Structure:
- USFM Files: Count chapters, verses, check for alignment markers
- TSV Files: Analyze headers, row structure, content type
- Markdown Files: Examine structure, headers, content organization
Extract Content Patterns:
- File size distributions
- Content complexity levels
- Cross-reference patterns
- Structural variations

Phase 3: Pattern Recognition and Guide Generation

Step 5: Cross-Repository Pattern Analysis

Group Repositories by Type:

Resource Container by subject
Scripture Burrito by flavor type
Tool-generated by tool and version

Identify Common Patterns:

File naming conventions
Directory structures
Content organization principles
Cross-reference systems
Metadata patterns

Detect Variations:

Non-standard implementations
Version differences
Tool-specific variations
Custom extensions

Step 6: Guide Creation and Updates

Determine Guide Needs:

New repository types requiring new guides
Existing guides needing updates
Missing information in current guides
Outdated information requiring correction

Generate Guide Content:

Use repository analysis data to populate guide templates
Include real examples from analyzed repositories
Document discovered patterns and variations
Provide specific processing instructions

MCP Analysis Instructions

Repository Analysis Workflow

Instruction Set 1: Basic Repository Analysis

TASK: Analyze Door43 repository structure and classification

INPUTS:
- Repository owner and name
- Door43 API base URL: https://git.door43.org/api/v1

PROCESS:
1. GET repository metadata from /repos/{owner}/{repo}
2. GET repository contents from /repos/{owner}/{repo}/contents
3. IDENTIFY specification files (manifest.yaml, metadata.json, manifest.json)
4. DOWNLOAD and PARSE specification files
5. CLASSIFY repository type based on specification content
6. EXTRACT key metadata fields
7. ANALYZE file structure and organization
8. SAMPLE content files for structure analysis

OUTPUT:
- Repository classification (RC, SB, tC, tS, Unknown)
- Specification version and key fields
- File structure analysis
- Content type distribution
- Processing recommendations

Instruction Set 2: Content Structure Analysis

TASK: Analyze repository content structure and patterns

INPUTS:
- Repository classification from previous analysis
- File listing from repository contents
- Sample content files

PROCESS:
1. CATEGORIZE files by extension and naming patterns
2. DOWNLOAD sample files of each major type
3. ANALYZE content structure:
   - USFM: Extract markers, count verses/chapters, check alignment
   - TSV: Analyze headers, detect content type, count rows
   - Markdown: Examine structure, headers, content organization
   - JSON: Parse structure, identify purpose, extract key fields
4. IDENTIFY cross-reference patterns
5. DETECT content complexity levels
6. EXTRACT content characteristics

OUTPUT:
- Content type analysis
- File organization patterns
- Cross-reference systems
- Content complexity assessment
- Processing requirements

Instruction Set 3: Guide Generation and Updates

TASK: Generate or update repository format guides

INPUTS:
- Repository analysis results
- Existing guide content (if any)
- Guide templates
- Cross-repository pattern analysis

PROCESS:
1. DETERMINE guide requirements:
   - New repository type needing new guide
   - Existing guide needing updates
   - Missing information requiring addition
2. SELECT appropriate guide template
3. POPULATE template with analysis data:
   - Repository characteristics
   - Manifest/metadata structure
   - File organization patterns
   - Content examples
   - Processing instructions
4. GENERATE natural language instructions
5. INCLUDE real examples from analyzed repositories
6. VALIDATE guide completeness and accuracy

OUTPUT:
- Updated or new repository format guide
- Migration guide (if RC to SB conversion applicable)
- Documentation of changes made
- Recommendations for further analysis

Specific Analysis Tasks

Task 1: Resource Container Analysis

Repository Targets:

unfoldingWord organization repositories
Gateway language organization repositories (es-419_gl, fr_gl, etc.)
Individual user RC repositories

Analysis Focus:

Dublin Core metadata completeness
Projects array structure and patterns
File naming conventions
Content type distribution
Dependency relationships
Checking information patterns

Guide Update Triggers:

New RC subject types discovered
Changes in manifest structure
New file organization patterns
Updated cross-reference systems

Task 2: Scripture Burrito Analysis

Repository Targets:

BurritoTruck organization repositories
Other organizations using SB format
Individual SB repositories

Analysis Focus:

Flavor type usage patterns
Ingredient organization strategies
Scope definition patterns
Relationship modeling approaches
MIME type usage
Custom extension patterns

Guide Update Triggers:

New flavor types in use
Novel ingredient organization patterns
Enhanced relationship modeling
New MIME type usage

Task 3: Tool-Generated Repository Analysis

Repository Targets:

translationCore generated repositories
translationStudio generated repositories
Other tool-generated repositories

Analysis Focus:

Tool version evolution
Manifest structure changes
File organization variations
Content structure patterns
Tool-specific metadata

Guide Update Triggers:

New tool versions with structure changes
New tools generating repositories
Changes in tool-specific metadata
Evolution in file organization

API Endpoint Usage Strategy

Rate Limiting Management

Anonymous Access (60 requests/hour):

Use for initial discovery and testing
Suitable for small-scale analysis

Authenticated Access (1000+ requests/hour):

Required for comprehensive analysis
Use for production guide generation
Enables private repository access

Efficient API Usage Patterns

Batch Discovery:

Use catalog endpoints for published resource discovery
Use organization endpoints for complete repository listings
Cache repository metadata for multiple analysis passes

Content Sampling Strategy:

Download specification files for all repositories first
Sample representative content files (first few of each type)
Use raw URLs for large content files to avoid API overhead

Caching Strategy:

Cache repository metadata (changes infrequently)
Cache specification files (stable between versions)
Refresh cache based on repository update timestamps

Guide Generation Templates

Repository Format Guide Template

Guide Structure:

Introduction - Repository type and characteristics
Identification - How to detect this repository type
Manifest/Metadata Structure - Expected format and fields
File Organization - Directory and file patterns
Content Analysis - Sample content and structure
Processing Instructions - Step-by-step handling
Application Integration - Preview and editing guidance
Best Practices - Optimization recommendations
Common Issues - Problems and solutions

Content Population:

Use real repository examples from analysis
Include actual manifest/metadata snippets
Show real file structures and content samples
Document discovered patterns and variations

Migration Guide Template

Guide Structure:

Introduction - Migration scope and benefits
Migration Overview - Complexity and requirements
Pre-Migration Analysis - Source repository assessment
Migration Steps - Step-by-step conversion process
Migration Examples - Real conversion examples
Validation - Quality assurance steps
Best Practices - Migration optimization
Common Issues - Migration problems and solutions

Content Population:

Use actual RC and SB examples
Include real field mappings
Show before/after conversion examples
Document migration benefits and limitations

Quality Assurance Instructions

Guide Validation Requirements

Content Accuracy:

Verify all examples use real repository data
Confirm API endpoints and responses are current
Validate file structure examples match reality
Check that processing instructions are complete

Completeness Checks:

Ensure all major repository types are covered
Verify migration paths exist for all RC types
Confirm guides address common use cases
Check that cross-references between guides are accurate

Natural Language Validation:

Ensure no programming language specifics
Verify step-by-step instructions are clear
Confirm examples are implementation-agnostic
Check that technical concepts are explained clearly

Continuous Improvement Process

Regular Analysis Schedule:

Monthly: Check for new repository types
Quarterly: Validate existing guide accuracy
Semi-annually: Comprehensive cross-repository analysis
Annually: Major guide structure review

Update Triggers:

New repository types discovered
Changes in API endpoints or responses
Evolution in repository structures
User feedback and issues reported

Implementation Recommendations

MCP System Architecture

Core Components:

API Client: Handle Door43 API interactions with rate limiting
Repository Analyzer: Parse and analyze repository structures
Pattern Detector: Identify common patterns across repositories
Guide Generator: Create and update guides based on analysis
Validator: Ensure guide quality and accuracy

Data Flow:

Repository Discovery → Specification Analysis → Content Sampling → 
Pattern Recognition → Guide Generation → Validation → Publication

Error Handling and Resilience

API Error Handling:

Handle rate limiting with exponential backoff
Manage network timeouts and retries
Gracefully handle missing or inaccessible repositories
Log errors for later review and resolution

Content Processing Errors:

Handle malformed specification files
Manage encoding issues in content files
Deal with unexpected file structures
Process repositories with missing standard files

Guide Generation Errors:

Validate generated content before publication
Handle template population failures
Manage cross-reference validation errors
Ensure guide completeness despite partial data

These MCP instructions enable automated analysis and guide generation for the Door43 ecosystem, ensuring documentation stays current and comprehensive as the repository landscape evolves.

Getting Started

Developer Guides

Repository Formats

Migration & Conversion

Automation & MCP

MCP Instructions: Door43 Repository Analyzer

Overview

MCP Capabilities Required

Core Functions Needed

Data Processing Requirements

Repository Analysis Framework

Phase 1: Repository Discovery and Classification

Step 1: Discover Target Repositories

Step 2: Initial Repository Classification

Phase 2: Deep Repository Analysis

Step 3: Specification Analysis

Step 4: Content Analysis

Phase 3: Pattern Recognition and Guide Generation

Step 5: Cross-Repository Pattern Analysis

Step 6: Guide Creation and Updates

MCP Analysis Instructions

Repository Analysis Workflow

Instruction Set 1: Basic Repository Analysis

Instruction Set 2: Content Structure Analysis

Instruction Set 3: Guide Generation and Updates

Specific Analysis Tasks

Task 1: Resource Container Analysis

Task 2: Scripture Burrito Analysis

Task 3: Tool-Generated Repository Analysis

API Endpoint Usage Strategy

Rate Limiting Management

Efficient API Usage Patterns

Guide Generation Templates

Repository Format Guide Template

Migration Guide Template

Quality Assurance Instructions

Guide Validation Requirements

Continuous Improvement Process

Implementation Recommendations

MCP System Architecture

Error Handling and Resilience

On this page