Nobody Can Find Your AI Project
The Invisible Project Problem
You built something with AI. It works. It might even be useful. But ask ChatGPT about it and you get nothing. Search Perplexity and your project doesn't exist. Google might index it eventually, but AI answer engines operate differently. They don't just crawl pages. They look for structured signals that most indie projects never provide.
We ran into this ourselves building Tandemly. The site was live, the projects worked, but AI engines had no idea we existed. What followed was a few weeks of figuring out what these engines actually look for. This is what we found.
The Domain Question
One of the first things we did was set up a custom domain. The conventional wisdom in the AEO space is that AI engines treat content on real domains differently from github.io subdomains or Replit URLs. We can't verify this ourselves since we started with a custom domain from day one. But it's cheap enough that there's no reason not to.
For hosting, you don't need anything expensive. GitHub Pages with a custom domain is free. Vercel and Cloudflare Pages are also free for personal projects. A $6/month VPS works if you need a backend.
The Files AI Engines Actually Look For
This was the biggest surprise. AI answer engines don't just read your pages. They look for specific files at your domain root that most personal sites never create. Once we added these to Tandemly, our visibility changed noticeably.
- robots.txt. Explicitly allow AI crawlers: GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Amazonbot. Many default configurations block these. If you haven't checked, you're probably blocking them.
- llms.txt. A plain-text file at your domain root that describes your project in a format AI agents can parse quickly. Think of it as a cover letter for machines. Include what you are, what you do, key projects, and links.
- sitemap.xml. Lists every page on your site with last-modified dates. AI crawlers use this to find and prioritize content. Without it, they're guessing which pages exist.
- RSS feed. Signals that your site is actively maintained. Some AI engines use RSS to track fresh content. Easy to add, easy to forget.
- Structured data (JSON-LD). Schema.org markup embedded in your pages. Tells AI engines what type of content you have. Use SoftwareApplication for tools, BlogPosting for articles, FAQPage for Q&A content. Most technically involved step, but the clearest payoff.
Writing That Gets Cited
Not all writing is equally visible to AI engines. We noticed a few patterns in content that gets picked up versus content that gets ignored.
Specificity matters. "A static site with 8 pages, JSON-LD on every page, and a custom sitemap" is citable. "A well-built website" is not. AI engines extract short snippets. If your key claim isn't contained in a single paragraph, it probably won't get picked up.
FAQ format performs well. Question-and-answer pairs with FAQPage schema are one of the more effective formats in our experience. AI engines are built to answer questions, and FAQ pages hand them pre-formatted answers.
Attribution helps. Framing claims as "According to [source]" makes content feel more authoritative to AI systems trained to evaluate source reliability. In our experience, naming specific tools, frameworks, and methodologies also helps. Named things tend to get cited more than generic descriptions.
What We'd Do Differently
A few things worth noting after going through this process.
We spent too long perfecting pages before publishing them. Speed of iteration mattered more than perfection. Publishing an okay page with proper schema markup and updating it later beat waiting to publish a perfect page. AI engines reward freshness, and dateModified signals in your structured data matter.
We also underestimated how much the machine-readability files (robots.txt, llms.txt, sitemap.xml) mattered relative to the content itself. We assumed good content would get found on its own. It didn't. The structured signals were what made the difference.
The Structure We're Building Toward
Think of your project site as a portfolio that serves two audiences: humans who visit directly and AI engines that might reference your work in answers. Those audiences want different things, but the structure that serves both looks roughly the same.
- Each project gets its own page with a clear description and tech stack.
- A central gallery page links to everything.
- Blog posts explain the why and how behind each project.
- An FAQ page answers common questions about you and your work.
- Machine-readability files (robots.txt, llms.txt, sitemap.xml, structured data) sit at the domain root tying it all together.
This is the structure we're building toward at Tandemly. We're not all the way there yet. But even partial implementation of these steps changed how visible our projects became to AI engines.
Common Questions
- How do AI answer engines discover indie AI projects?
- AI answer engines discover projects through structured signals at your domain root: robots.txt (with AI crawlers explicitly allowed), llms.txt (a plain-text project description for AI agents), sitemap.xml (page listing with last-modified dates), RSS feeds (freshness signal), and JSON-LD structured data. Most indie projects never create these files, which is why AI engines don't find them.
- What is llms.txt and why does it matter for AI discoverability?
- llms.txt is a plain-text file at your domain root that describes your project in a format AI agents can parse quickly. It includes what you are, what you do, key projects, and links. Think of it as a cover letter for machines. It's a newer standard that helps AI engines understand your site without crawling every page.
- What kind of content gets cited by AI engines?
- Content that gets cited tends to be specific and self-contained within a single paragraph. FAQ format with FAQPage schema performs well because AI engines are built to answer questions. Naming specific tools, frameworks, and methodologies also increases citability — named things get cited more than generic descriptions.
- Do you need a custom domain for AI engine visibility?
- Conventional wisdom in the AEO space is that AI engines treat content on custom domains differently from github.io subdomains or Replit URLs. This is difficult to verify independently, but a custom domain is cheap (often free with GitHub Pages or Cloudflare Pages) and there's no reason not to use one.
- What is the best site structure for AI builders who want to be discovered?
- A portfolio site that serves both humans and AI engines needs: individual pages for each project with descriptions and tech stack, a central gallery page linking to everything, blog posts explaining the why and how, an FAQ page answering common questions, and machine-readability files (robots.txt, llms.txt, sitemap.xml, structured data) at the domain root.
- Why doesn't my AI project show up in ChatGPT answers?
- AI answer engines like ChatGPT draw from training data and, in some cases, web search. For your project to appear, it needs to be indexed by search engines, have structured data signaling what it is, and be referenced by content AI crawlers trust. The most common reasons indie projects don't appear: no structured data, not indexed by Google, or the URL was never crawled because no other site links to it.
- How do I get my project listed in AI search engines?
- Four things matter most: get a real domain, add Schema.org structured data, create an llms.txt file describing your project, and submit your sitemap to Google Search Console. AI search engines like Perplexity and ChatGPT with search lean on Google's index, so getting indexed by Google is the foundation. The structured files help AI agents parse your content correctly once they find it.
- What is a robots.txt file and does it affect AI discoverability?
- Yes, significantly. A robots.txt file tells web crawlers which pages they're allowed to visit. Many default configurations block all unknown bots, which includes AI crawlers like GPTBot (OpenAI), ClaudeBot (Anthropic), and PerplexityBot. If your robots.txt doesn't explicitly allow these, they won't crawl your site. The fix is adding explicit Allow rules for each AI crawler, or using Allow: / to permit all bots you haven't specifically blocked.