Some bots and crawlers are particularly aggressive and may put undue load on your servers, resulting in potential outages and performance issues or slow response times.
This document describes available approaches to block these crawlers.

Using `robots.txt`

robots.txt is a web standard that tells crawlers, indexers, and bots how to behave. This file includes instructions describing which web paths are and aren’t crawlable and can set limits on the number of requests a crawler may send.

Softrip has a constantly-updated standard robots.txt with our recommended settings; reach out to your Softrip contact for the latest version.

Sample Softrip standard robots.txt:

User-agent: *
Crawl-delay: 5
Disallow: /apps/
Disallow: /appsint/
Disallow: /aspnet_client/
Disallow: /bin/
Disallow: /bin-BKP/
Disallow: /certificates/   
Disallow: /cms/
Disallow: /crm/
Disallow: /Home/  
Disallow: /includes/
Disallow: /pdf/
Disallow: /policy/
Disallow: /product/
Disallow: /res/
Disallow: /reservations/
Disallow: /rss/
Disallow: /shared/
Disallow: /softripnext/ 
Disallow: /STNAttach/
Disallow: /STNView/
Disallow: /stw/
Disallow: /stsw/
Disallow: /temp/
Disallow: /test/
Disallow: /testing/
Disallow: /view_invoice/
Disallow: /view_voucher/
Disallow: /webctrl_client/
Disallow: /groups/*
Allow: /groups/$
Disallow: /Cms/
Disallow: /cms/
Allow: /cms/xmlsitemap
#Block Amazon crawler
User-agent: Amazonbot
Disallow: /
#Block dotbot
User-agent: dotbot 
Disallow: /
#Block Yandex
User-agent: Yandex 
Disallow: /
#Block all Semrush crawlers/bots
User-agent: SemrushBot
Disallow: /
User-agent: SplitSignalBot
Disallow: / 
User-agent: SiteAuditBot
Disallow: / 
User-agent: SemrushBot-BA
Disallow: / 
User-agent: SemrushBot-SI
Disallow: / 
User-agent: SemrushBot-SWA
Disallow: / 
User-agent: SemrushBot-CT
Disallow: / 
User-agent: SemrushBot-BM
Disallow: / 
#Block PetalBot
User-agent: PetalBot
Disallow: /  
# Block Claude (LLM Scraper)
User-agent: ClaudeBot
Crawl-delay: 100
Disallow: /
# Block Common Crawl (LLM Scraper)
User-agent: CCBot
Crawl-delay: 100
Disallow: /
# Block GPT bot (OpenAI Scraper)
User-agent: GPTBot
Crawl-delay: 100
Disallow: /
# Block OAI-SearchBot (OpenAI Search Bot)
User-agent: OAI-SearchBot
Crawl-delay: 100
Disallow: /
# Block Facebook/Meta
User-agent: facebookexternalhit
Crawl-delay: 100
Disallow: /
# Block Facebook/Meta
User-agent: meta-externalagent
Crawl-delay: 100
Disallow: /

Using IIS Request Filtering

Some bots and crawlers ignore robots.txt (for example, “Facebook external hit”). In addition, crawlers that do respect robots.txt may cache it for a long time and a fix on that file may not take effect for hours.

In those cases, IIS can be configured to reject requests for specific user agents.

In your site’s root web.config, add the following section with a denyString for each user agent you want to block:

<configuration>
  [...]
  <system.webServer>
        [...]
		<security>
			<requestFiltering>
				<filteringRules>
					<filteringRule name="Block Bots and Crawlers" scanUrl="false" scanQueryString="false">
						<scanHeaders>
							<add requestHeader="User-Agent" />
						</scanHeaders>
						<denyStrings>
							<add string="facebookexternalhit" /> <!-- Block Facebook crawler -->
							<add string="meta-externalagent" /> <!-- Meta/facebook -->
							<add string="GPTBot" /> <!-- Block OpenAI GPT crawler -->
							<add string="OAI-SearchBot" /> <!-- Block OpenAI GPT crawler -->
						</denyStrings>
					</filteringRule>
				</filteringRules>
			</requestFiltering>
		</security>
  </system.webServer>
</configuration>

Using IP Restrictions

Some bad actors do not use a unique user agent and do not respect robots.txt. In those cases, you may have to block those originations altogether.

To block requests by IP address (or IP range):

In IIS, open “IP Address and Domain Restrictions” on the server root
- You may have to install this module from Server Manager
Add “Deny” entries for known-bad single IPs or IP ranges

Using Dynamic Restrictions

In addition to the measures detailed above, you may also want to enable dynamic restrictions which will reject requests from originations that are sending too many requests too often.

Knowledge Base

Dealing with Aggressive Bots and Crawlers

Using `robots.txt`

Using IIS Request Filtering

Using IP Restrictions

Using Dynamic Restrictions

Dealing with Aggressive Bots and Crawlers

Using robots.txt

Using IIS Request Filtering

Using IP Restrictions

Using Dynamic Restrictions

Using `robots.txt`