In Industry, Pay Per Click, Web Development by attercopia

Banning Bad Bots Using The global.asa File In Classic ASP

Bad bots can cause problems for your website. They can submit spam to your forum or blog, spam your contact form, or just use up your valuable resources such as bandwidth and CPU. If you use Classic ASP this article will show you how to ban bad bots from your entire website using the global.asa file. WSI was recently asked by one of our longstanding clients to investigate why their website had started using a much greater amount of bandwidth than expected. Our client was already utilising our web analytics services, so our first port of call was their Google Analytics account so that we could investigate their web traffic.

Identify the traffic source

We soon identified that large numbers of page views were being generated from a Google AdWords campaign that was no longer active. Naturally this aroused our suspicion, because an inactive pay per click (PPC) campaign should not be generating any traffic at all! Our next course of action was to analyse the website’s server logs in order to further identify the source of the traffic. We quickly isolated the page views as being generated by a bot. A bot is a software application that runs automated tasks over the Internet. The largest use of bots is in web spidering by search engines and the like.

The bad bot

This is exactly what the bot in question was doing: loading our client’s page an average of once every 10 minutes 24 hours a day. A page load every 10 minutes isn’t a problem for server load, however this bot was also requesting all of the page media such as images and JavaScripts, and therefore wasting our client’s resources. We then had to choose the most appropriate way to ban the bot.

Banning the bad bot

Our client’s website is built in classic ASP and hosted on a Microsoft IIS/6.0 machine, so that dictated the methods that we could use to ban the nuisance bot.

robots.txt

We assumed from the start that the bot would not obey any exclusion set up via the robots.txt file, so we ignored that option completely.

ASP code at page level

It would be possible to insert ASP code into individual pages in order to stop the pages loading if they were requested by the bot in question. However, this would mean editing multiple existing pages and then monitoring the website to check whether further pages started being affected by the bot. There are more suitable ways of achieving our goal using the IIS/6.0 web server itself.

global.asa

IIS/6.0 allows for an optional file called global.asa that can include scripts that can be accessed by every page in an ASP application. If we edit our global.asa file to block the bot then it’ll be immediately banned from every page on the website. (Note that this does not include content such as images or JavaScripts.)

Identify the bad bot

There are two main ways in which a bot can be identified: by its IP address and its user-agent. (A user-agent is a text string that identifies the client application making the request.) Normally it’s considered best practice to identify and ban a bot based on its IP address rather than its user-agent, because a user-agent can be easily spoofed. Bad bots often originate from one or a small number of IPs or networks and identifying these is usually the preferred method of blocking. However, in this case our client’s website logs showed us that the bot came from a wide range of IP addresses on different networks, but as far as we could see it did always identify itself correctly with a legitimate user-agent string. We therefore decided that the best way to combat this particular bot would be to ban it based on its user-agent.

Code to ban the bad bot using global.asa

Banning the bad bot using the global.asa file is fairly straightforward. The global.asa file has 4 events: Application_OnStart,Session_OnStart, Session_OnEnd and Application_OnEnd. By adding bot blocking code to the Session_OnStart event it will be executed whenever a user (including the bot) starts a session. We used the following code:

If request.ServerVariables("HTTP_USER_AGENT") = "insert bad bot user-agent here" Then
Session.Abandon
Response.End()
End If

What this code does is identify the bot based on its user agent and then drop the session with Session.Abandon and stop the script execution with Response.End() If we didn’t drop the session then the bot could accept the session cookie and request the page again. If it did this the page would be served normally, as the bot blocking code is only executed when a new session is started. By dropping the session for the bot we ensure that a new session is started every time that it requests a page. Using Response.End() means that no HTML ever reaches the bot. This serves two purposes: it reduces bandwidth by not sending unnecessary HTML code; and it means that the bot does not receive the locations of other resources within the HTML response. As mentioned earlier, images and JavaScripts etc… can still be requested. However, if the bot receives no HTML from its initial request it shouldn’t know where to find those resources.

Conclusions

Initial indications show that the bot ban has worked very well, with our client’s bandwidth coming back down to normal levels. The bot banning code could be further refined and extended:

  • We could look for specific keywords or phrases within the user-agent string rather than checking the whole string
  • We could send a “403 Forbidden” HTTP response rather than “200 OK”, which would be the technically correct thing to do when banning a visitor
  • We could have a list of bad bots that should be banned, rather than just identifying one bot
  • We could ban bots based on both their user-agents and IP addresses

Perhaps we’ll investigate these possibilities in another post!

Share this Post

Want great digital content? Join our mailing list today!

* indicates required