About this tool
The Robots.txt Analyzer checks your robots.txt file for common problems and optimization opportunities. Here's what it looks for:
- Syntax validation: Makes sure your file is properly formatted with valid directives
- Critical SEO issues: Catches rules that accidentally block your entire site or homepage from search engines
- Resource blocking: Warns you if CSS, JavaScript, or images are blocked (this hurts your SEO)
- User-agent configuration: Checks that you have proper user-agent directives set up correctly
- Sitemap presence: Looks for sitemap declarations and makes sure the URLs are valid
- Crawl-delay optimization: Flags crawl-delay values that are too high and slow down indexing
- Best practice recommendations: Gives you actionable suggestions with specific line numbers
You'll get a health score from 0 to 100 based on how well your file is configured, along with categorized issues, detailed rule breakdowns, and tips for improvement. This helps you avoid accidentally blocking search engines while still keeping sensitive areas of your site protected.
Perfect robots.txt Example
A well-configured robots.txt file should be clear, purposeful, and avoid common pitfalls. Here's an example of a properly structured robots.txt that follows best practices:
# Allow all search engines to crawl the entire site
User-agent: *
Disallow:
# Block sensitive areas from all crawlers
Disallow: /admin/
Disallow: /api/
Disallow: /private/
# Allow access to CSS and JavaScript (important for SEO)
Allow: /css/
Allow: /js/
Allow: /assets/
# Specific rules for GPTBot (OpenAI's crawler)
User-agent: GPTBot
Disallow: /api/
Allow: /
# Sitemap location (helps search engines discover your content)
Sitemap: https://example.com/sitemap.xml
# Optional: Crawl delay for aggressive bots (use sparingly)
User-agent: *
Crawl-delay: 1 Key principles demonstrated:
- Default allow: Start with an empty
Disallow:to let all crawlers access your site by default - Explicit blocking: Only block specific sensitive directories like /admin/, /api/, /private/
- Allow CSS/JS: Never block /css/, /js/, or /assets/ because search engines need these to properly render your pages
- Include sitemap: Always tell search engines where your sitemap is so they can discover your content
- Specific user-agents: Set up targeted rules for certain bots (like GPTBot) when you need to
- Minimal crawl-delay: Keep crawl-delay under 5 seconds or skip it entirely since most modern crawlers handle this on their own
- Comments for clarity: Use # to add helpful comments that explain what each rule does
Frequently Asked Questions
What is robots.txt?
robots.txt is a text file that tells search engine crawlers which pages they can and cannot access on your website. It's placed in your website's root directory (e.g., https://example.com/robots.txt) and follows the Robots Exclusion Protocol. This file helps you control how search engines interact with your content.
Why should I analyze my robots.txt file?
A misconfigured robots.txt can accidentally block search engines from indexing your entire site, preventing it from appearing in search results. Common issues include blocking CSS/JavaScript files (which affects SEO), overly restrictive rules, syntax errors, and missing sitemap declarations. Regular analysis helps catch these problems before they impact your visibility.
What does the health score mean?
The health score (0-100) reflects your robots.txt configuration quality:
- 90-100 (Excellent): Well-configured with no critical issues
- 70-89 (Good): Minor improvements possible but generally healthy
- 50-69 (Needs Improvement): Several issues that should be addressed
- 0-49 (Critical Issues): Serious problems blocking search engines
What are critical issues?
Critical issues prevent search engines from indexing your site properly. Examples include:
- Disallow: / - Blocks your entire website from all search engines
- Blocking homepage - Prevents indexing of your main page
- No User-agent directives - Invalid robots.txt structure
These should be fixed immediately to maintain search visibility.
Should I block CSS and JavaScript files?
No. Google and other modern search engines need to render your pages to understand content and usability. Blocking CSS/JavaScript files can hurt your SEO by preventing proper page rendering. Remove rules like Disallow: /css/ or Disallow: /js/ from your robots.txt.
What is crawl-delay and should I use it?
Crawl-delay tells bots to wait a specified number of seconds between requests. While it can reduce server load, values above 10 seconds may significantly slow down indexing. Most modern search engines ignore this directive in favor of automatic rate limiting. Use with caution and keep values low (≤5 seconds) if needed.