High Priority
Implement /ai-guidelines.txt Protocol
Establish a machine-readable manifest of your corporate knowledge base and data access policies specifically for AI agents and enterprise LLM integrations.
Create a text file at /ai-guidelines.txt with a concise overview of your company's core operational domains and data governance principles.
Include markdown-style links to key enterprise resource pages, regulatory compliance documents, and authoritative internal knowledge repositories.
Add a 'Data Access Policy' section to directly address common AI agent queries regarding data provenance, permissible use cases, and security classifications.


Configure your Mature companies crawler protocols effortlessly.
Join 2,000+ teams scaling with AI.
High Priority
Enterprise LLM Selective Indexing
Fine-tune which segments of your corporate data landscape should be ingested by internal or partner LLM crawlers, ensuring data integrity and compliance.
Configure `User-agent` directives for specific enterprise LLM identifiers (e.g., `User-agent: InternalCorpLLMBot`) `Allow`: `/financial-reports/` `Allow`: `/product-lifecycle-data/` `Disallow`: `/employee-private-data/`
Utilize your internal security and access control tools to verify crawler permissions and data access scopes.
Monitor ingestion patterns in your enterprise data lake and security logs to confirm LLM agents are accessing designated data nodes and adhering to access controls.
Medium Priority
Semantic Data Structure for Enterprise Knowledge Graphs
Leverage semantic HTML and structured data markup to enable LLM scrapers to accurately interpret the relationships and hierarchy within your corporate information assets.
Wrap core business process documentation and policy documents within `<article>` tags to denote authoritative content.
Utilize `<section>` elements with precise `aria-label` attributes for distinct business units, product lines, or strategic initiatives.
Ensure all tabular data, particularly financial statements or operational metrics, employ proper `<thead>`, `<tbody>`, and `<th>` tags for robust structured data extraction and schema adherence.
High Priority
Knowledge Retrieval-Augmented Generation (RAG) Friendly Data Formatting
Structure your enterprise data so it can be efficiently 'chunked' and retrieved by RAG pipelines for accurate and context-aware AI responses.
Isolate related concepts and decision-making frameworks within discrete information packets, ideally not exceeding 1000-1500 tokens per logical unit.
Explicitly state the primary subject or business context at the beginning of each data segment, avoiding reliance on implicit or 'floating' context.
Eliminate ambiguous pronoun references and replace them with specific corporate entity names, product identifiers, or process designations to ensure clarity for AI.