← Back to blog
2026-04-14·5 min read

What Your robots.txt Reveals to Attackers

What is robots.txt?

robots.txt is a plain text file at the root of your website that tells search engine crawlers which pages to index and which to skip. It follows the Robots Exclusion Protocol and is publicly accessible at yourdomain.com/robots.txt.

The problem: it is also a roadmap for attackers.

How robots.txt helps attackers

When you add a path to robots.txt, you are telling the entire internet "this path exists and I do not want it crawled." Attackers read robots.txt first because it often reveals hidden areas of a site.

Common leaks

  • **Admin panels** — Disallow: /admin, /wp-admin, /dashboard
  • **API endpoints** — Disallow: /api/internal, /api/v2/debug
  • **Staging environments** — Disallow: /staging, /dev, /test
  • **Backup files** — Disallow: /backups, /db-backup
  • **Configuration paths** — Disallow: /config, /.env, /settings
  • **User data** — Disallow: /uploads/private, /user-files
  • Each of these entries confirms the path exists. An attacker does not even need to guess: you have told them where to look.

    The fundamental misunderstanding

    robots.txt is not a security mechanism. It is a polite request to well-behaved crawlers. Malicious bots and human attackers ignore it entirely.

    Disallowing a path in robots.txt does not:

  • Block access to the page
  • Require authentication
  • Hide the path from view
  • Prevent the page from being linked or shared
  • If a page should not be accessed by unauthorized users, it must be protected by authentication, not by a robots.txt entry.

    Security best practices

    1. Audit your current robots.txt

    Check what your robots.txt currently reveals. [CQwerty Shield's robots.txt checker](/tools/robots-txt-checker) analyses your file and flags entries that may expose sensitive paths.

    2. Remove sensitive paths

    Do not list admin panels, API endpoints, or internal tools in robots.txt. Protect them with authentication instead.

    3. Keep it minimal

    A good robots.txt for most sites is simple:

    User-agent: *

    Sitemap: https://example.com/sitemap.xml

    Only add Disallow rules for paths that are public but not worth indexing (like search result pages or print-friendly versions).

    4. Protect sensitive areas properly

  • Admin panels: require authentication + IP allowlisting
  • API endpoints: require API keys or OAuth tokens
  • Staging environments: use a different domain or require VPN access
  • Backup files: do not store them in web-accessible directories
  • 5. Use noindex instead

    If you want a page to exist but not appear in search results, use a `<meta name="robots" content="noindex">` tag or an X-Robots-Tag HTTP header. This keeps the path out of search engines without advertising its existence in robots.txt.

    Key takeaways

  • robots.txt is public. Everyone can read it, including attackers.
  • Listing sensitive paths in robots.txt is worse than not listing them at all.
  • Use authentication and access controls for real security, not robots.txt.
  • Keep your robots.txt minimal and audit it regularly.
  • [Check your robots.txt now](/tools/robots-txt-checker)

    Ready to check your domain?

    Run all 18 security checks in 2 minutes. Free, no signup required.

    Check Your robots.txt