Tools: Your AI Strategy Has a Blind Spot: An SEO and AEO Audit of vibescoder.dev

Tools: Your AI Strategy Has a Blind Spot: An SEO and AEO Audit of vibescoder.dev

The TL;DR for Non-Technical Readers

The TL;DR for Technical Readers

The Audit

The Cloudflare Gotcha (Yes, Again)

What Is AEO?

AEO-Specific Changes

llms.txt and llms-full.txt

Person Schema with sameAs

Full-Content RSS

Unblocking AI Crawlers

SEO-Specific Changes

Sitemap.xml

Canonical URLs

Homepage Caching

Custom 404 Page

Changes That Help Both

JSON-LD Structured Data

Heading Anchor IDs

RSS Feed Fix

Article Meta Tags

Image Improvements

The Cloudflare Settings While We Were in the Dashboard

The Complete Scorecard

What I Learned

What's Next

By the Numbers I spend a lot of time thinking about how AI agents discover and consume content. I run a company that builds developer tools. I write a blog about building with AI agents. And most importantly, I'm married to a woman that runs an AI consulting practice. Through the home-office wall I've heard her warn many a client that they have a silent suppressor in their content strategy if they're a Cloudflare customer. She recommends a site audit. And she was right. Until this morning, every major AI crawler was blocked from reading my site. Not by choice. Not by misconfiguration. By a Cloudflare setting I'd already turned off — that got silently re-enabled by a different setting I didn't know existed. If you're a content creator, marketer, or engineer who cares about whether ChatGPT, Perplexity, Google AI Overviews, or Claude can find your work — read this. The infrastructure between your content and your audience may be working against you. If you don't want to read the whole audit, here's what matters: Cloudflare's free tier blocks AI search engines by default. If your site uses Cloudflare (and millions do), your content may be invisible to ChatGPT, Perplexity, Claude, and Google's AI features — even if you never asked for that. There are now two categories of discoverability. Traditional SEO (Google search results) and AEO — Answer Engine Optimization (AI-powered search and assistants). You need both. They require different things. The fix for Cloudflare takes 60 seconds — but you have to know it exists. Go to Security → Settings → "Manage your robots.txt" and switch from "Instruct AI bots to not scrape content" to either "Content Signals Policy" or "Disable robots.txt configuration." There's a new file called llms.txt that's becoming the robots.txt for AI. It tells AI agents what your site is, what it covers, and where to find content. If you don't have one, you're leaving discoverability on the table. We ran a full SEO + AEO audit against vibescoder.dev and found 20 issues across 4 severity levels. The highlights: The commit: SEO/AEO overhaul. I asked my Coder agent to evaluate vibescoder.dev on two dimensions: traditional search engine optimization (SEO) and Answer Engine Optimization (AEO) — making the site discoverable and citable by AI agents like ChatGPT Search, Perplexity, Google AI Overviews, and Claude. The agent cloned the engine repo, crawled the live site, inspected every response header, parsed every meta tag, and cross-referenced the codebase against both SEO and AEO best practices. The results were humbling. I wrote about Cloudflare's AI crawler settings two weeks ago. In that post, I specifically called out that Cloudflare's free tier has "Block AI bots" and "AI Labyrinth" turned on by default. I explicitly turned both off. I even wrote this: "If your site exists for thought leadership, you want AI services to find, index, and cite your content. Blocking AI crawlers is blocking your distribution channel." I was right. And I was still blocked. The problem: Cloudflare has a separate setting called "Manage your robots.txt" under Security → Settings. It's not the same as "Block AI bots." It's a newer feature that injects directives directly into your robots.txt file at the edge — after your origin server responds. Here's what the agent found when it compared my repo's robots.txt (100 bytes, 7 lines) to what Cloudflare was actually serving: Cloudflare was prepending 1,738 bytes of content — including Disallow: / rules for ClaudeBot, GPTBot, Google-Extended, Amazonbot, CCBot, Bytespider, and meta-externalagent — without updating the Content-Length header. The setting responsible? "Instruct AI bots to not scrape content," which was selected by default. The fix: Security → Settings → "Manage your robots.txt" → select "Disable robots.txt configuration." This tells Cloudflare to stop modifying your robots.txt entirely. Your origin file gets served as-is. Why "Disable" and not "Content Signals Policy"? The Content Signals option keeps a Content-Signal: ai-train=no directive, which tells AI crawlers not to use your content for model training. That sounds reasonable — but for a personal blog trying to maximize reach, being in the training corpus means AI models are more likely to know about you and reference your ideas. The risk it protects against (content absorbed without credit) is theoretical. The cost (reduced presence in AI systems) is concrete. Gotcha #1: Cloudflare has three separate AI-related settings, and changing one doesn't affect the others. You need to check all three: I had turned off #1 and #2 weeks ago. But #3 was still on — silently rewriting my robots.txt at the CDN layer. Here's the full picture — the Security Overview flagging the AI-related action items, and each of the three settings: AEO — Answer Engine Optimization — is the practice of making your content discoverable and citable by AI agents. (You'll also see it referred to as AI Engine Optimization or Agentic Engine Optimization — the discipline is new enough that the name is still settling.) It's the emerging counterpart to SEO. Where SEO focuses on Google's traditional index, AEO targets the systems that power ChatGPT Search, Perplexity, Google AI Overviews, Claude, and whatever comes next. You need both. Many of the improvements help both. But some are AEO-specific. These improvements specifically target AI agent discoverability: llms.txt is an emerging convention — think of it as robots.txt for AI comprehension rather than crawling. It tells AI agents what your site is, what topics it covers, and where to find content. We created two files: The full-content version is the important one. When an AI agent wants to cite your work, it needs the actual content — not just metadata. llms-full.txt is a single endpoint that gives it everything. JSON-LD structured data tells AI engines who wrote something and where else that person exists online. The sameAs property connects identity across platforms: When ChatGPT or Perplexity decides whether to cite "Rob Whiteley, CEO of Coder" in a response about AI-assisted development, this structured data is what gives it confidence in the attribution. The existing RSS feed only had <description> (a short excerpt). AI agents that consume RSS — and Perplexity in particular indexes it — get significantly more context from full-content feeds. We added <content:encoded> with the full post body, plus <author> and <managingEditor> tags. The Cloudflare fix described above. The single highest-impact AEO change — going from completely invisible to fully accessible. These target traditional Google search: robots.txt referenced it. It didn't exist. Every SEO tool and Google Search Console would flag this. We created src/app/sitemap.ts with dynamic generation — all posts, tags, and static pages with lastmod dates from the changelog. No page had <link rel="canonical">. Without it, Google can treat URL variants (?utm_source=twitter, ?ref=hackernews) as separate pages. We added explicit canonical URLs to every page type — homepage, posts, about, tags, and individual tag pages. The homepage was set to force-dynamic — every request hit the server with zero caching. For a blog that publishes daily at most, that's unnecessary. We switched to ISR with a 60-second revalidation window. (Vercel still serves it dynamically due to a cookies() call for admin detection — a future refactor.) The default Next.js 404 is a dead end. Our custom version shows recent posts and navigation links — keeping both users and crawlers moving through the site instead of bouncing. Most improvements benefit both SEO and AEO: The single biggest miss. We added three schema types: For SEO, this enables rich results in Google — article carousels, author info, breadcrumbs. For AEO, it's how AI engines understand content relationships and authorship with confidence. Added rehype-slug to the MDX pipeline. Every H2 and H3 now gets an auto-generated id attribute. Every link in the RSS feed was a 404. The feed used /blog/ as the URL prefix, but the actual routes use /posts/. All 15 posts were broken. One-line fix, massive impact — RSS is a primary discovery mechanism for both Google and AI agents. Added article:author, article:tag, article:modified_time, and og:site_name to post OpenGraph metadata. These help both Google and AI engines categorize and attribute content correctly. MDX images now render inside <figure> with <figcaption> elements, and images without explicit alt text get an auto-generated fallback from the filename. Both changes improve how crawlers — traditional and AI — understand image content. While fixing the robots.txt issue, we also optimized two other Cloudflare settings: Every change, its impact, and whether it addresses AEO, SEO, or both: Total: 20 changes. 13 help AEO. 17 help SEO. 11 help both. AEO is a real discipline now, not a buzzword. The gap between "my content exists on the internet" and "AI agents can find, understand, and cite my content" is significant. Structured data, llms.txt, full-content RSS, heading anchors — these aren't nice-to-haves. They're the difference between being in the AI conversation and being invisible to it. Your CDN can silently undermine your content strategy. This is the one that stings. I knew about the Cloudflare AI bot setting. I wrote a blog post about turning it off. And a different setting — one I didn't know existed — was doing the same thing through a different mechanism. If you use Cloudflare, check your robots.txt right now. Not the file in your repo — the one Cloudflare is actually serving. curl https://yoursite.com/robots.txt and compare it to what you expect. The audit paid for itself in the first finding. Everything else — the JSON-LD, the canonical URLs, the sitemap — those are incremental improvements that compound over time. But the Cloudflare fix was binary: invisible → visible. Every day that setting was on was a day ChatGPT Search, Perplexity, and Google AI Overviews couldn't index my content. The one thing we identified but didn't implement: FAQPage schema for how-to posts. Several posts follow a problem/solution pattern that could surface as direct answers in AI search. The frontmatter already has a type field distinguishing how-to from opinion — the infrastructure is there. That's next. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to ? It will become hidden in your post, but will still be visible via the comment's permalink. as well , this person and/or

Code Block

Copy

{ "@type": "Person", "name": "Rob Whiteley", "url": "https://vibescoder.dev/about", "jobTitle": "CEO", "sameAs": [ "https://www.linkedin.com/in/rwhiteley", "https://github.com/carryologist", "https://x.com/rwhiteley0" ], "worksFor": { "@type": "Organization", "name": "Coder", "url": "https://coder.com" } } CODE_BLOCK: { "@type": "Person", "name": "Rob Whiteley", "url": "https://vibescoder.dev/about", "jobTitle": "CEO", "sameAs": [ "https://www.linkedin.com/in/rwhiteley", "https://github.com/carryologist", "https://x.com/rwhiteley0" ], "worksFor": { "@type": "Organization", "name": "Coder", "url": "https://coder.com" } } CODE_BLOCK: { "@type": "Person", "name": "Rob Whiteley", "url": "https://vibescoder.dev/about", "jobTitle": "CEO", "sameAs": [ "https://www.linkedin.com/in/rwhiteley", "https://github.com/carryologist", "https://x.com/rwhiteley0" ], "worksFor": { "@type": "Organization", "name": "Coder", "url": "https://coder.com" } } - Cloudflare's free tier blocks AI search engines by default. If your site uses Cloudflare (and millions do), your content may be invisible to ChatGPT, Perplexity, Claude, and Google's AI features — even if you never asked for that. - There are now two categories of discoverability. Traditional SEO (Google search results) and AEO — Answer Engine Optimization (AI-powered search and assistants). You need both. They require different things. - The fix for Cloudflare takes 60 seconds — but you have to know it exists. Go to Security → Settings → "Manage your robots.txt" and switch from "Instruct AI bots to not scrape content" to either "Content Signals Policy" or "Disable robots.txt configuration." - There's a new file called llms.txt that's becoming the robots.txt for AI. It tells AI agents what your site is, what it covers, and where to find content. If you don't have one, you're leaving discoverability on the table. - 4 P0 (critical): Cloudflare's managed robots.txt was blocking GPTBot, ClaudeBot, Google-Extended, and 5 others. RSS feed had wrong URL prefix (15 broken links). Sitemap.xml was referenced but returned 404. Duplicate User-agent: * blocks in robots.txt. - 6 P1 (high): No JSON-LD structured data. No llms.txt. No canonical URLs. No heading anchor IDs. Missing article:author/tag meta. Homepage force-dynamic. - Everything was fixed in a single session — 17 files changed, 428 insertions, pushed and deployed. - /llms.txt — a structured summary: site description, author, topics, key posts, and links - /llms-full.txt — a dynamic route that serves every published post's full content as plain text - WebSite — site-level metadata with author info (every page) - BlogPosting — per-post schema with headline, dates, author, keywords, reading time (post pages) - BreadcrumbList — navigation hierarchy (post pages) - SEO: Google uses these for "jump to" links in search results and featured snippets. - AEO: AI agents cite specific sections via fragment URLs (#the-cloudflare-gotcha). Without heading IDs, citations can only link to the full page. - Early Hints — enabled. Cloudflare sends 103 Early Hints responses from the edge, letting browsers start loading fonts and CSS before Vercel even responds. - Smart Tiered Caching — enabled. Cloudflare edge nodes share cached content with each other, reducing origin hits. Ready to deliver benefits once ISR caching is fully enabled. - AI Labyrinth — confirmed still off. This injects fake content links to trap AI crawlers — the opposite of what a content site wants. - 1,738 bytes of robots.txt injected by Cloudflare without updating Content-Length - 8 AI crawlers blocked (GPTBot, ClaudeBot, Google-Extended, Amazonbot, CCBot, Bytespider, Applebot-Extended, meta-externalagent) - 15 RSS feed links returning 404 — every single one - 0 → 3 JSON-LD schema types (WebSite, BlogPosting, BreadcrumbList) - 0 → 5 pages with canonical URLs - 17 files changed, 428 lines added - 3 Cloudflare settings that control AI crawlers — and you have to check all of them - 60 seconds to fix the Cloudflare setting that was blocking all AI visibility - ~2 hours for the full audit and implementation of all 20 changes - 1 blog post that I thought had solved this problem — it hadn't