SEOApril 13, 20268 min read

The Complete Guide to Ecommerce Robots.txt

How to configure robots.txt for your ecommerce store. Block the right paths, allow the right crawlers, and avoid the mistakes that kill your SEO.

StoreVitals Team

Your robots.txt file is the first thing search engine crawlers read when they visit your store. Get it wrong, and you could be blocking Google from indexing your best pages — or letting it waste crawl budget on pages that shouldn't be indexed at all.

What robots.txt Does (and Doesn't Do)

Robots.txt tells crawlers which paths they're allowed to access. It does not prevent pages from appearing in search results — it prevents crawling. If other sites link to a page you've blocked in robots.txt, Google may still index the URL (just without content). To prevent indexing, use noindex meta tags or X-Robots-Tag headers instead.

The Default robots.txt Problem

Most ecommerce platforms ship with a default robots.txt that's either too permissive or too restrictive:

Shopify: Blocks /admin, /cart, /checkout, /orders — reasonable defaults but doesn't block filter/sort URLs that create crawl waste
WooCommerce: Often ships with no robots.txt at all, letting everything get crawled
BigCommerce: Has sensible defaults but doesn't account for custom URL patterns
Magento: Ships with an extensive robots.txt but blocks some URLs that should be crawlable

What to Block

1. Internal Search Results

Disallow: /search
Disallow: /*?q=
Disallow: /*?s=

Internal search result pages are thin content with no unique value. They waste crawl budget and can create duplicate content issues when Google indexes hundreds of search queries.

2. Faceted Navigation and Filters

Disallow: /*?sort=
Disallow: /*?filter=
Disallow: /*?color=
Disallow: /*?size=
Disallow: /*?page=

A category page with 5 filters, each with 10 options, creates 100,000+ URL combinations. Google will try to crawl all of them. Block filter parameters to keep crawl budget focused on canonical category pages.

3. User Account Pages

Disallow: /account
Disallow: /my-account
Disallow: /wishlist
Disallow: /orders

These are behind login anyway. No reason to let crawlers discover them.

4. Cart and Checkout

Disallow: /cart
Disallow: /checkout
Disallow: /thank-you

Transactional pages with no SEO value.

5. Tag Pages (if thin)

Disallow: /tagged/
Disallow: /tag/

Tag pages with only 1-2 products are thin content. If your tag pages have substantial, unique product collections, keep them crawlable.

What to Allow

All product pages — your money pages
Category pages — important for navigation and rankings
Blog posts — content marketing assets
CSS, JS, and image files — Google needs these to render your pages properly. Never block them.
Sitemap location — always include a Sitemap directive

The Sitemap Directive

Always include your sitemap URL in robots.txt:

Sitemap: https://yourstore.com/sitemap.xml

This helps crawlers discover your sitemap even if they haven't found it through other means. Place it at the bottom of the file, outside any User-agent blocks.

Common Mistakes

Blocking CSS and JavaScript

Some store owners block /assets/ or /static/ thinking it saves crawl budget. This prevents Google from rendering your pages properly. Googlebot needs to load your CSS and JS to understand your page layout, which affects Core Web Vitals assessment and mobile-friendliness.

Using robots.txt for Security

Robots.txt is public. Anyone can read it. Don't put admin paths or sensitive URLs in there — you're just advertising them. Use authentication and proper access controls instead.

Blocking Your Own CDN

If your images are served from a CDN subdomain (like cdn.yourstore.com), make sure the CDN domain has a robots.txt that allows crawling. Otherwise, Google can't index your product images.

Not Testing After Changes

A misplaced wildcard or wrong path in robots.txt can deindex your entire store. Always test changes using Google Search Console's robots.txt tester before deploying.

Template for Ecommerce Stores

User-agent: *
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /my-account
Disallow: /wishlist
Disallow: /orders
Disallow: /search
Disallow: /*?q=
Disallow: /*?sort=
Disallow: /*?filter=

# Allow all assets
Allow: /assets/
Allow: /static/
Allow: /*.css
Allow: /*.js
Allow: /*.jpg
Allow: /*.png
Allow: /*.webp

Sitemap: https://yourstore.com/sitemap.xml

Customize this for your platform and URL structure. Use StoreVitals' free robots.txt analyzer to validate your configuration and catch common mistakes.

robots.txtSEOtechnical SEOecommercecrawling