MCP Instructions: Door43 Repository Analyzer
Overview
This document provides comprehensive instructions for an MCP (Model Context Protocol) system to analyze Door43 repositories using the Door43 API endpoints, extract repository structure and content information, and automatically update or create repository format guides.
Core Functions Needed
- HTTP Request Capability: Make API calls to Door43 endpoints
- File Processing: Read, parse, and analyze various file formats (YAML, JSON, USFM, TSV, Markdown)
- Content Analysis: Extract patterns, structures, and metadata from repository content
- Documentation Generation: Create and update markdown guides based on analysis
- Cross-Repository Comparison: Identify patterns across multiple repositories
Data Processing Requirements
- YAML Parser: For Resource Container manifest.yaml files
- JSON Parser: For Scripture Burrito metadata.json and tool manifests
- Text Analysis: For USFM, TSV, and Markdown content structure analysis
- Pattern Recognition: Identify file naming patterns, directory structures, content types
Step 1: Discover Target Repositories
Use Catalog API for Published Resources:
GET https://git.door43.org/api/v1/catalog/search?stage=prod
Use Gitea API for All Repositories:
GET https://git.door43.org/api/v1/repos?org=unfoldingWord
GET https://git.door43.org/api/v1/repos?org=BurritoTruck
Filter by Repository Patterns:
- Resource Containers:
{lang}_{resource}pattern - Tool-generated: Look for specific naming patterns
- Scripture Burritos: Check organizations like BurritoTruck
Step 2: Initial Repository Classification
For Each Repository:
-
Get Repository Metadata:
GET /api/v1/repos/{owner}/{repo} -
Get File Listing:
GET /api/v1/repos/{owner}/{repo}/contents -
Identify Specification Type:
- Look for manifest.yaml, metadata.json, manifest.json
- Download and parse specification files
- Classify as RC, Scripture Burrito, translationCore, translationStudio, or Unknown
Step 3: Specification Analysis
For Resource Container Repositories:
-
Download Manifest:
GET /api/v1/repos/{owner}/{repo}/contents/manifest.yaml -
Extract Key Information:
dublin_core.subject- Resource typedublin_core.type- Container typedublin_core.identifier- Resource identifierprojects[]- File structurerelation[]- Dependencies
-
Analyze Projects Structure:
- File naming patterns
- Content types (USFM, TSV, Markdown)
- Organization principles
- Scope and coverage
For Scripture Burrito Repositories:
-
Download Metadata:
GET /api/v1/repos/{owner}/{repo}/contents/metadata.json -
Extract Key Information:
meta.formatandmeta.versiontype.flavorType- Resource categoryingredients{}- File organizationrelationships[]- Dependencies
For Tool-Generated Repositories:
-
Download Manifest:
GET /api/v1/repos/{owner}/{repo}/contents/manifest.json -
Detect Tool Type:
tc_versionorgenerator.namefor translationCorepackage_versionandgenerator.namefor translationStudio
Step 4: Content Analysis
Sample Content Files:
-
Download Sample Files (first few of each type):
GET /api/v1/repos/{owner}/{repo}/contents/{file_path} -
Analyze Content Structure:
- USFM Files: Count chapters, verses, check for alignment markers
- TSV Files: Analyze headers, row structure, content type
- Markdown Files: Examine structure, headers, content organization
-
Extract Content Patterns:
- File size distributions
- Content complexity levels
- Cross-reference patterns
- Structural variations
Step 5: Cross-Repository Pattern Analysis
Group Repositories by Type:
- Resource Container by subject
- Scripture Burrito by flavor type
- Tool-generated by tool and version
Identify Common Patterns:
- File naming conventions
- Directory structures
- Content organization principles
- Cross-reference systems
- Metadata patterns
Detect Variations:
- Non-standard implementations
- Version differences
- Tool-specific variations
- Custom extensions
Step 6: Guide Creation and Updates
Determine Guide Needs:
- New repository types requiring new guides
- Existing guides needing updates
- Missing information in current guides
- Outdated information requiring correction
Generate Guide Content:
- Use repository analysis data to populate guide templates
- Include real examples from analyzed repositories
- Document discovered patterns and variations
- Provide specific processing instructions
Instruction Set 1: Basic Repository Analysis
TASK: Analyze Door43 repository structure and classification
INPUTS:
- Repository owner and name
- Door43 API base URL: https://git.door43.org/api/v1
PROCESS:
1. GET repository metadata from /repos/{owner}/{repo}
2. GET repository contents from /repos/{owner}/{repo}/contents
3. IDENTIFY specification files (manifest.yaml, metadata.json, manifest.json)
4. DOWNLOAD and PARSE specification files
5. CLASSIFY repository type based on specification content
6. EXTRACT key metadata fields
7. ANALYZE file structure and organization
8. SAMPLE content files for structure analysis
OUTPUT:
- Repository classification (RC, SB, tC, tS, Unknown)
- Specification version and key fields
- File structure analysis
- Content type distribution
- Processing recommendations
Instruction Set 2: Content Structure Analysis
TASK: Analyze repository content structure and patterns
INPUTS:
- Repository classification from previous analysis
- File listing from repository contents
- Sample content files
PROCESS:
1. CATEGORIZE files by extension and naming patterns
2. DOWNLOAD sample files of each major type
3. ANALYZE content structure:
- USFM: Extract markers, count verses/chapters, check alignment
- TSV: Analyze headers, detect content type, count rows
- Markdown: Examine structure, headers, content organization
- JSON: Parse structure, identify purpose, extract key fields
4. IDENTIFY cross-reference patterns
5. DETECT content complexity levels
6. EXTRACT content characteristics
OUTPUT:
- Content type analysis
- File organization patterns
- Cross-reference systems
- Content complexity assessment
- Processing requirements
Instruction Set 3: Guide Generation and Updates
TASK: Generate or update repository format guides
INPUTS:
- Repository analysis results
- Existing guide content (if any)
- Guide templates
- Cross-repository pattern analysis
PROCESS:
1. DETERMINE guide requirements:
- New repository type needing new guide
- Existing guide needing updates
- Missing information requiring addition
2. SELECT appropriate guide template
3. POPULATE template with analysis data:
- Repository characteristics
- Manifest/metadata structure
- File organization patterns
- Content examples
- Processing instructions
4. GENERATE natural language instructions
5. INCLUDE real examples from analyzed repositories
6. VALIDATE guide completeness and accuracy
OUTPUT:
- Updated or new repository format guide
- Migration guide (if RC to SB conversion applicable)
- Documentation of changes made
- Recommendations for further analysis
Task 1: Resource Container Analysis
Repository Targets:
- unfoldingWord organization repositories
- Gateway language organization repositories (es-419_gl, fr_gl, etc.)
- Individual user RC repositories
Analysis Focus:
- Dublin Core metadata completeness
- Projects array structure and patterns
- File naming conventions
- Content type distribution
- Dependency relationships
- Checking information patterns
Guide Update Triggers:
- New RC subject types discovered
- Changes in manifest structure
- New file organization patterns
- Updated cross-reference systems
Task 2: Scripture Burrito Analysis
Repository Targets:
- BurritoTruck organization repositories
- Other organizations using SB format
- Individual SB repositories
Analysis Focus:
- Flavor type usage patterns
- Ingredient organization strategies
- Scope definition patterns
- Relationship modeling approaches
- MIME type usage
- Custom extension patterns
Guide Update Triggers:
- New flavor types in use
- Novel ingredient organization patterns
- Enhanced relationship modeling
- New MIME type usage
Task 3: Tool-Generated Repository Analysis
Repository Targets:
- translationCore generated repositories
- translationStudio generated repositories
- Other tool-generated repositories
Analysis Focus:
- Tool version evolution
- Manifest structure changes
- File organization variations
- Content structure patterns
- Tool-specific metadata
Guide Update Triggers:
- New tool versions with structure changes
- New tools generating repositories
- Changes in tool-specific metadata
- Evolution in file organization
Rate Limiting Management
Anonymous Access (60 requests/hour):
- Use for initial discovery and testing
- Suitable for small-scale analysis
Authenticated Access (1000+ requests/hour):
- Required for comprehensive analysis
- Use for production guide generation
- Enables private repository access
Efficient API Usage Patterns
Batch Discovery:
- Use catalog endpoints for published resource discovery
- Use organization endpoints for complete repository listings
- Cache repository metadata for multiple analysis passes
Content Sampling Strategy:
- Download specification files for all repositories first
- Sample representative content files (first few of each type)
- Use raw URLs for large content files to avoid API overhead
Caching Strategy:
- Cache repository metadata (changes infrequently)
- Cache specification files (stable between versions)
- Refresh cache based on repository update timestamps
Repository Format Guide Template
Guide Structure:
- Introduction - Repository type and characteristics
- Identification - How to detect this repository type
- Manifest/Metadata Structure - Expected format and fields
- File Organization - Directory and file patterns
- Content Analysis - Sample content and structure
- Processing Instructions - Step-by-step handling
- Application Integration - Preview and editing guidance
- Best Practices - Optimization recommendations
- Common Issues - Problems and solutions
Content Population:
- Use real repository examples from analysis
- Include actual manifest/metadata snippets
- Show real file structures and content samples
- Document discovered patterns and variations
Migration Guide Template
Guide Structure:
- Introduction - Migration scope and benefits
- Migration Overview - Complexity and requirements
- Pre-Migration Analysis - Source repository assessment
- Migration Steps - Step-by-step conversion process
- Migration Examples - Real conversion examples
- Validation - Quality assurance steps
- Best Practices - Migration optimization
- Common Issues - Migration problems and solutions
Content Population:
- Use actual RC and SB examples
- Include real field mappings
- Show before/after conversion examples
- Document migration benefits and limitations
Guide Validation Requirements
Content Accuracy:
- Verify all examples use real repository data
- Confirm API endpoints and responses are current
- Validate file structure examples match reality
- Check that processing instructions are complete
Completeness Checks:
- Ensure all major repository types are covered
- Verify migration paths exist for all RC types
- Confirm guides address common use cases
- Check that cross-references between guides are accurate
Natural Language Validation:
- Ensure no programming language specifics
- Verify step-by-step instructions are clear
- Confirm examples are implementation-agnostic
- Check that technical concepts are explained clearly
Continuous Improvement Process
Regular Analysis Schedule:
- Monthly: Check for new repository types
- Quarterly: Validate existing guide accuracy
- Semi-annually: Comprehensive cross-repository analysis
- Annually: Major guide structure review
Update Triggers:
- New repository types discovered
- Changes in API endpoints or responses
- Evolution in repository structures
- User feedback and issues reported
MCP System Architecture
Core Components:
- API Client: Handle Door43 API interactions with rate limiting
- Repository Analyzer: Parse and analyze repository structures
- Pattern Detector: Identify common patterns across repositories
- Guide Generator: Create and update guides based on analysis
- Validator: Ensure guide quality and accuracy
Data Flow:
Repository Discovery → Specification Analysis → Content Sampling →
Pattern Recognition → Guide Generation → Validation → Publication
Error Handling and Resilience
API Error Handling:
- Handle rate limiting with exponential backoff
- Manage network timeouts and retries
- Gracefully handle missing or inaccessible repositories
- Log errors for later review and resolution
Content Processing Errors:
- Handle malformed specification files
- Manage encoding issues in content files
- Deal with unexpected file structures
- Process repositories with missing standard files
Guide Generation Errors:
- Validate generated content before publication
- Handle template population failures
- Manage cross-reference validation errors
- Ensure guide completeness despite partial data
These MCP instructions enable automated analysis and guide generation for the Door43 ecosystem, ensuring documentation stays current and comprehensive as the repository landscape evolves.