Is llms.txt an official standard?

Not a W3C standard yet, but an increasingly adopted convention. The original proposal is from Jeremy Howard (Answer.AI). Anthropic, Vercel, Mintlify, FastAPI, Drizzle ORM and others implement it. Meanwhile, AI crawlers increasingly respect it.

How is it different from robots.txt?

robots.txt says 'what crawlers can read'. llms.txt says 'what we are, in dense plain language'. They're complementary. robots.txt remains necessary for access control; llms.txt adds editorial context.

Do I still need sitemap.xml?

Yes. sitemap.xml indexes URLs for traditional crawlers (Googlebot, Bingbot). llms.txt summarizes the brand for AI crawlers. Each covers a different flow.

When do I add llms-full.txt?

If your site has relevant technical documentation (typical in B2B SaaS, libraries, frameworks), llms-full.txt exposes all of that documentation in a single plain file. Models use it when they need deep technical answers. If your site is pure marketing, llms.txt is enough.

llms.txt explained: the file Anthropic, Vercel and Mintlify already implement

What llms.txt is, how it differs from robots.txt and sitemap.xml, how to write a good one, common mistakes, and when to add llms-full.txt.

llms.txt is a plain markdown file at the root of your domain that summarizes your site for language models. An AI crawler reads it in seconds, without parsing HTML, CSS, or waiting for JavaScript to execute. Anthropic, Vercel, Mintlify, FastAPI and Drizzle already implement it. This is the guide to do it right.

What it is exactly

The file lives at https://yourdomain.com/llms.txt. Plain markdown, no styling, no complicated metadata. Its goal is to answer, in fewer than 200 lines, the basic questions a model would ask about your site: what we are, what we offer, where to find more.

The standard was proposed by Jeremy Howard (Answer.AI) in September 2024. Not W3C yet. But adoption is moving fast among companies that depend on being cited by LLMs.

How it differs from files you already have

robots.txt says what crawlers can read. Access control.
sitemap.xml lists all indexable URLs with update dates. For traditional crawlers.
llms.txt editorially summarizes the brand, in dense language, for AI crawlers.

The three coexist. They don't replace each other.

Recommended structure

The standard does not impose a rigid structure. The one working best in production looks like this:

# Your brand name

## What it is
A clear sentence of what you do and who you serve.

## Why it matters
The concrete problem you solve. In 1-2 lines.

## Services or products
- Item 1 — what it includes, in one line.
- Item 2 — what it includes, in one line.
- Item 3 — what it includes, in one line.

## Who we are
Team, location, relevant public sites or projects.

## Differentiators
What sets you apart, no marketingese. Facts.

## Cases or evidence
- Client / project X — what outcome we achieved.
- Client / project Y — what outcome we achieved.

## Contact
Email: hello@yourdomain.com
Web: https://yourdomain.com

## Languages
Spanish: https://yourdomain.com/es
English: https://yourdomain.com/en

The trick is writing as if your reader were an editor in a hurry. No empty adjectives, no long enumerations, no sentences that sound like a brochure.

When to add llms-full.txt

If your site has relevant technical documentation — APIs, frameworks, libraries, manuals — it's worth exposing it in a separate file: /llms-full.txt. A plain, ordered dump of all the documentation, with no HTML chrome.

Anthropic does this for their API. Drizzle does it for their ORM. The idea is that an agent programming against your product can read all the relevant documentation in a single request, without navigating separate pages.

Common mistakes

Filling it with slogans.“Innovation leaders” tells a model nothing. Verifiable facts do.
Copying the home as is. The home has marketing. llms.txt needs compressed, useful information.
Forgetting to maintain it. When your services change or your positioning sharpens, llms.txt has to reflect that. Crawlers come back.
Not declaring it in robots.txt. Not strictly necessary, but some crawlers discover it faster if you list it.
Making it 5,000 words long. The file is designed for density. If you exceed 200-300 lines, focus is lost. Long form goes in llms-full.txt.

How to verify it works

Serve it correctly with Content-Type: text/markdown or text/plain. Some crawlers get confused with strange MIME types.
Ask an LLM with web search: “Read https://yourdomain.com/llms.txt and summarize what the brand does.” If the summary is accurate, the file is well written.
Check your access logs for user-agents like GPTBot, Claude-Web, PerplexityBot. If you see requests to the file, the crawlers are consuming it.

Why publish it even if few respect it today

It's a near-zero-cost bet with material upside. Today some crawlers respect it; in 12 months more will. Your llms.txt will be ready when the ecosystem finishes adopting it — and it will be read when it matters.

The precedent: robots.txt appeared in 1994 as a convention with no technical enforcement. Today it's respected by every serious crawler. llms.txt is on the same curve, a decade later.