High Priority
Deploy Community Index Protocol (/community.txt)
Establish a machine-readable summary of your entire community hierarchy, including key discussion areas, documentation, and resource hubs, specifically for AI agents and LLMs.
Create a text file at the root of your domain (e.g., yourcommunity.com/community.txt) with a brief introduction to your community's purpose and scope.
Include markdown-style links pointing to your most critical community sections: core documentation, popular forums/channels, contribution guides, and API references.
Add a 'FAQ' section within the file to directly address common queries that AI training bots might have about your community's structure, governance, or contribution process.


Configure your Developer communities crawler protocols effortlessly.
Join 2,000+ teams scaling with AI.
High Priority
AI Agent Selective Ingestion Control
Fine-tune which specific sections of your developer community platform (e.g., forums, docs, code repos, issue trackers) should be indexed or ingested by AI crawlers and LLM agents.
Implement `User-agent: *` and `Disallow: /private/`, `Disallow: /user-uploads/` in your primary robots.txt to block sensitive areas.
For specific agents like `GPTBot` or custom dev-focused crawlers, define granular `Allow` and `Disallow` rules for sections like `/docs/api/v1/`, `/forum/topics/popular/`, or `/contributing/guides/`.
Utilize tools like Google's `robots.txt` tester or custom script checks to verify that your specified crawler permissions are correctly interpreted and applied.
Medium Priority
Semantic HTML for Developer Content Ingestion
Leverage HTML5 semantic elements and ARIA attributes to clearly define the structure and hierarchy of technical content, enabling LLM scrapers to accurately parse code snippets, documentation, and discussions.
Wrap primary content blocks, such as API documentation pages or detailed tutorials, within `<article>` tags to signify distinct, self-contained pieces of information.
Use `<section>` elements with descriptive `aria-label` attributes (e.g., `aria-label="API Endpoint Reference"`, `aria-label="Troubleshooting Guide"`) to delineate logical groupings of content within a page.
Ensure all data tables, especially those displaying code versions, dependencies, or performance metrics, use proper `<thead>`, `<tbody>`, and `<th>` tags for structured data extraction.
High Priority
RAG-Friendly Snippet Optimization for Technical Answers
Structure your community content, particularly Q&A sections and documentation, so that individual pieces of information can be easily 'chunked' and retrieved by Retrieval-Augmented Generation (RAG) pipelines for accurate AI responses.
Isolate related concepts and code examples within logical content containers, ideally not exceeding 500-700 tokens, to facilitate precise retrieval.
Minimize 'floating' context by explicitly stating the primary subject or function within each snippet's summary or introductory sentence, avoiding reliance on external context.
Eliminate ambiguous pronouns (e.g., 'it', 'this', 'they') and replace them with specific technical terms, function names, or component identifiers to ensure clarity for AI processing.