Using `robots.txt`

robots.txt is a web standard that tells crawlers, indexers, and bots how to behave. This file includes instructions describing which web paths are and aren’t crawlable and can set limits on the number of requests a crawler may send.

...

Code Block

User-agent: *
Crawl-delay: 5
Disallow: /apps/
Disallow: /appsint/
Disallow: /aspnet_client/
Disallow: /bin/
Disallow: /bin-BKP/
Disallow: /certificates/   
Disallow: /cms/
Disallow: /crm/
Disallow: /Home/  
Disallow: /includes/
Disallow: /pdf/
Disallow: /policy/
Disallow: /product/
Disallow: /res/
Disallow: /reservations/
Disallow: /rss/
Disallow: /shared/
Disallow: /softripnext/ 
Disallow: /STNAttach/
Disallow: /STNView/
Disallow: /stw/
Disallow: /stsw/
Disallow: /temp/
Disallow: /test/
Disallow: /testing/
Disallow: /view_invoice/
Disallow: /view_voucher/
Disallow: /webctrl_client/
Disallow: /groups/*
Allow: /groups/$
Disallow: /Cms/
Disallow: /cms/
Allow: /cms/xmlsitemap
#Block Amazon crawler
User-agent: Amazonbot
Disallow: /
#Block dotbot
User-agent: dotbot 
Disallow: /
#Block Yandex
User-agent: Yandex 
Disallow: /
#Block all Semrush crawlers/bots
User-agent: SemrushBot
Disallow: /
User-agent: SplitSignalBot
Disallow: / 
User-agent: SiteAuditBot
Disallow: / 
User-agent: SemrushBot-BA
Disallow: / 
User-agent: SemrushBot-SI
Disallow: / 
User-agent: SemrushBot-SWA
Disallow: / 
User-agent: SemrushBot-CT
Disallow: / 
User-agent: SemrushBot-BM
Disallow: / 
#Block PetalBot
User-agent: PetalBot
Disallow: /  
# Block Claude (LLM Scraper)
User-agent: ClaudeBot
Crawl-delay: 100
Disallow: /
# Block Common Crawl (LLM Scraper)
User-agent: CCBot
Crawl-delay: 100
Disallow: /
# Block GPT bot (OpenAI Scraper)
User-agent: GPTBot
Crawl-delay: 100
Disallow: /
# Block OAI-SearchBot (OpenAI Search Bot)
User-agent: OAI-SearchBot
Crawl-delay: 100
Disallow: /
# Block Facebook/Meta
User-agent: facebookexternalhit
Crawl-delay: 100
Disallow: /
# Block Facebook/Meta
User-agent: meta-externalagent
Crawl-delay: 10100
Disallow: /

Using IIS Request Filtering

...

Code Block

language	xml

<configuration>
  [...]
  <system.webServer>
        [...]
		<security>
			<requestFiltering>
				<filteringRules>
					<filteringRule name="Block Bots and Crawlers" scanUrl="false" scanQueryString="false">
						<scanHeaders>
							<add requestHeader="User-Agent" />
						</scanHeaders>
						<denyStrings>
							<add string="facebookexternalhit" /> <!-- Block Facebook crawler DDoS-->
							<add string="meta-externalagent" /> <!-- Meta/facebook -->
							<add string="GPTBot" /> <!-- Block OpenAI GPT crawler -->
							<add string="OAI-SearchBot" /> <!-- Block OpenAI GPT crawler -->
						</denyStrings>
					</filteringRule>
				</filteringRules>
			</requestFiltering>
		</security>
  </system.webServer>
</configuration>

Versions Compared

Old Version 3

New Version Current

Key

Using `robots.txt`

Using IIS Request Filtering

Page Comparison

Versions Compared

Old Version 3

New Version Current

Key

Using robots.txt

Using IIS Request Filtering

Using `robots.txt`