2026-04-14·5 min read

What is robots.txt?

robots.txt is a plain text file at the root of your website that tells search engine crawlers which pages to index and which to skip. It follows the Robots Exclusion Protocol and is publicly accessible at yourdomain.com/robots.txt.

The problem: it is also a roadmap for attackers.

How robots.txt helps attackers

When you add a path to robots.txt, you are telling the entire internet "this path exists and I do not want it crawled." Attackers read robots.txt first because it often reveals hidden areas of a site.

Common leaks

Admin panels, Disallow: /admin, /wp-admin, /dashboard

API endpoints, Disallow: /api/internal, /api/v2/debug

Staging environments, Disallow: /staging, /dev, /test

Backup files, Disallow: /backups, /db-backup

Configuration paths, Disallow: /config, /.env, /settings

User data, Disallow: /uploads/private, /user-files

Each of these entries confirms the path exists. An attacker does not even need to guess: you have told them where to look.

The fundamental misunderstanding

robots.txt is not a security mechanism. It is a polite request to well-behaved crawlers. Malicious bots and human attackers ignore it entirely.

Disallowing a path in robots.txt does not:

Block access to the page

Require authentication

Hide the path from view

Prevent the page from being linked or shared

If a page should not be accessed by unauthorized users, it must be protected by authentication, not by a robots.txt entry.

Security best practices

1. Audit your current robots.txt

Check what your robots.txt currently reveals. CQwerty Shield's robots.txt checker analyses your file and flags entries that may expose sensitive paths.

2. Remove sensitive paths

Do not list admin panels, API endpoints, or internal tools in robots.txt. Protect them with authentication instead.

3. Keep it minimal

A good robots.txt for most sites is simple:

User-agent: *
Sitemap: https://example.com/sitemap.xml

Only add Disallow rules for paths that are public but not worth indexing (like search result pages or print-friendly versions).

4. Protect sensitive areas properly

Admin panels: require authentication + IP allowlisting

API endpoints: require API keys or OAuth tokens

Staging environments: use a different domain or require VPN access

Backup files: do not store them in web-accessible directories

5. Use noindex instead

If you want a page to exist but not appear in search results, use a <meta name="robots" content="noindex"> tag or an X-Robots-Tag HTTP header. This keeps the path out of search engines without advertising its existence in robots.txt.

Key takeaways

robots.txt is public. Everyone can read it, including attackers.

Listing sensitive paths in robots.txt is worse than not listing them at all.

Use authentication and access controls for real security, not robots.txt.

Keep your robots.txt minimal and audit it regularly.

Check your robots.txt now

ShareTwitter LinkedIn EmailRSS

Ready to check your domain?

Run all 18 security checks in 2 minutes. Free, no signup required.

Check Your robots.txt →

What Your robots.txt Reveals to Attackers