The Complete Guide to Ecommerce Robots.txt
How to configure robots.txt for your ecommerce store. Block the right paths, allow the right crawlers, and avoid the mistakes that kill your SEO.
Your robots.txt file is the first thing search engine crawlers read when they visit your store. Get it wrong, and you could be blocking Google from indexing your best pages — or letting it waste crawl budget on pages that shouldn't be indexed at all.
What robots.txt Does (and Doesn't Do)
Robots.txt tells crawlers which paths they're allowed to access. It does not prevent pages from appearing in search results — it prevents crawling. If other sites link to a page you've blocked in robots.txt, Google may still index the URL (just without content). To prevent indexing, use noindex meta tags or X-Robots-Tag headers instead.
The Default robots.txt Problem
Most ecommerce platforms ship with a default robots.txt that's either too permissive or too restrictive:
- Shopify: Blocks /admin, /cart, /checkout, /orders — reasonable defaults but doesn't block filter/sort URLs that create crawl waste
- WooCommerce: Often ships with no robots.txt at all, letting everything get crawled
- BigCommerce: Has sensible defaults but doesn't account for custom URL patterns
- Magento: Ships with an extensive robots.txt but blocks some URLs that should be crawlable
What to Block
1. Internal Search Results
Disallow: /search
Disallow: /*?q=
Disallow: /*?s=
Internal search result pages are thin content with no unique value. They waste crawl budget and can create duplicate content issues when Google indexes hundreds of search queries.
2. Faceted Navigation and Filters
Disallow: /*?sort=
Disallow: /*?filter=
Disallow: /*?color=
Disallow: /*?size=
Disallow: /*?page=
A category page with 5 filters, each with 10 options, creates 100,000+ URL combinations. Google will try to crawl all of them. Block filter parameters to keep crawl budget focused on canonical category pages.
3. User Account Pages
Disallow: /account
Disallow: /my-account
Disallow: /wishlist
Disallow: /orders
These are behind login anyway. No reason to let crawlers discover them.
4. Cart and Checkout
Disallow: /cart
Disallow: /checkout
Disallow: /thank-you
Transactional pages with no SEO value.
5. Tag Pages (if thin)
Disallow: /tagged/
Disallow: /tag/
Tag pages with only 1-2 products are thin content. If your tag pages have substantial, unique product collections, keep them crawlable.
What to Allow
- All product pages — your money pages
- Category pages — important for navigation and rankings
- Blog posts — content marketing assets
- CSS, JS, and image files — Google needs these to render your pages properly. Never block them.
- Sitemap location — always include a Sitemap directive
The Sitemap Directive
Always include your sitemap URL in robots.txt:
Sitemap: https://yourstore.com/sitemap.xml
This helps crawlers discover your sitemap even if they haven't found it through other means. Place it at the bottom of the file, outside any User-agent blocks.
Common Mistakes
Blocking CSS and JavaScript
Some store owners block /assets/ or /static/ thinking it saves crawl budget. This prevents Google from rendering your pages properly. Googlebot needs to load your CSS and JS to understand your page layout, which affects Core Web Vitals assessment and mobile-friendliness.
Using robots.txt for Security
Robots.txt is public. Anyone can read it. Don't put admin paths or sensitive URLs in there — you're just advertising them. Use authentication and proper access controls instead.
Blocking Your Own CDN
If your images are served from a CDN subdomain (like cdn.yourstore.com), make sure the CDN domain has a robots.txt that allows crawling. Otherwise, Google can't index your product images.
Not Testing After Changes
A misplaced wildcard or wrong path in robots.txt can deindex your entire store. Always test changes using Google Search Console's robots.txt tester before deploying.
Template for Ecommerce Stores
User-agent: *
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /my-account
Disallow: /wishlist
Disallow: /orders
Disallow: /search
Disallow: /*?q=
Disallow: /*?sort=
Disallow: /*?filter=
# Allow all assets
Allow: /assets/
Allow: /static/
Allow: /*.css
Allow: /*.js
Allow: /*.jpg
Allow: /*.png
Allow: /*.webp
Sitemap: https://yourstore.com/sitemap.xml
Customize this for your platform and URL structure. Use StoreVitals' free robots.txt analyzer to validate your configuration and catch common mistakes.