Bot Detection Techniques

INTEGRATION SERVICES FOR WEBSITE SECURITY

Botnets becoming more than an annoyance across the web today. Webmasters, ISPs and Hosts continuously seek ways to detect and eliminate artificial activity from bots who present themselves as regular visitors.

Among the problems many site owners face are:

  • Content Scrapping
  • Malware Infection
  • Spam infestation

Using botnets, attackers can initiate accesses to multiple sites given a small number of resources. A single system can access multiple sites at the same time, scrap content, attempt to inject arbitrary content into the server's database, fill-in forms in the hopes of successful links spam, propagate illegal content, enticing visitors to go to a compromised website stealing their credentials and of course eventually cloning their systems making them part of the botnet.

Form Accessed by a HumanThe problem with bot access is a multilayered one as many areas are involved from the visitor's connection system to the Internet, up to the server host with a website. Routers, PC systems, Server software installed, applications running, are all candidates for attackers to exploit given flaws in the software.

In this article we will discuss one aspect for botnet detection at the application level, the server hosts a website. Methods described here focus on HTML and CSS techniques applied to web forms, that can gather various data for storage and can differentiate a bot from a human visitor reliably. In addition the methods discussed can be implemented without changing the existing functionality of web pages.

Using HTML and CSS simple methods the implementation to differentiate bots and humans can be adapted to any application, regardless of server programming language. So lets begin with web forms posted by visitors

<form name="contact_form" action="/process_contact.php" method="post">
<div><input type="text" name="keywords" value="" size="10" maxlength="100" />
<input type="text" name="keywords2" class="hiddenset" /></div>
<div><input type="image" src="/button_submit.gif" name="button_submit" /></div>
</form>

In this form sample, we have 2 input fields one of which is hidden the other one exposed. The purpose of the second field, which is hidden, is to force the submit button to be flagged on keystrokes like "Enter and Return" and not to do bot-detection. The method is essential as different browsers will treat a single input text box differently than others. The code above covers all popular browsers and the hidden input field is only required for forms with a single edit box. In other words having a second input field even as hidden we can determine whether or not the submit button was pressed either by the mouse pointer or by hitting return. The hiddenset CSS class has the display:hidden rule implemented.

Now we need to implement the detection method against bots. We will modify the form above adding a series of repeated buttons with all of them using the same image, but each one of them has a different CSS class. The generation for the series of submit images can be automated by the form script.

<form name="contact_form" action="/process_contact.php" method="post">
<div><input type="text" name="keywords" value="" size="10" maxlength="100" />
<input type="text" name="keywords2" class="hiddenset" /></div>
<div>
<input type="image" src="/button_submit.gif" name="button_submit0" class="biz0" />
<input type="image" src="/button_submit.gif" name="button_submit1" class="biz1" />
<input type="image" src="/button_submit.gif" name="button_submit2" class="biz2" />
<input type="image" src="/button_submit.gif" name="button_submit3" class="biz3" />
<input type="image" src="/button_submit.gif" name="button_submit4" class="biz4" />
<input type="image" src="/button_submit.gif" name="button_submit5" class="biz5" />
<input type="image" src="/button_submit.gif" name="button_submit6" class="biz6" />
<input type="image" src="/button_submit.gif" name="button_submit7" class="biz7" />
<input type="image" src="/button_submit.gif" name="button_submit8" class="biz8" />
<input type="image" src="/button_submit.gif" name="button_submit9" class="biz9" />
</div>
</form>

The second version of the form includes the series of buttons we talked about. By using CSS we can now set all buttons hidden except one. If we wanted to display the image with input name button_submit7 we could set all biz# classes to hidden except biz7.

Form Accessed by a BotThe form processing code can identify the button pressed because of the names each of the button has. Therefore by pressing the button_submit7 the code will identify the coordinates and the name of the button via the /POST array. button_submit7_x and button_submit7_y are indices inside the /POST array. Their values contain the coordinates where the button was pressed using the mouse pointer. If a keystroke takes place the coordinates will be 0 for both x and y axis.

Using a simple PHP sample we can rule out the cases where a hidden button is pressed. If button_submit0 is also pressed the form processing code could simply discard the entire request. Visually the first form from the second has no difference. There are no new input fields or other hidden elements or questions the form prompts the visitor to answer.

Now lets take a look the extra processing and probability factors a bot needs to go through to successfully submit the form. The current form presents a probability 1 to 10 to get the right button pressed.

With respect to CSS in order for the bot to identify the class that contains the visible button becomes complicated. It needs to parse the HTML document identifying the CSS files. Then it needs to parse each file to identify the CSS classes or HTML document. It also needs to identify the order and override mechanisms browsers do. Writing code for the bot to process these details is extremely complicated so lets assume a browser engine is used in the first place to correctly process the CSS classes. Even so, the output cannot be visually presented to a piece of software. It has to follow some logic so the hidden elements are identified. The extra processing required for form processing is not a feasible task a bot can accomplish.

Taking the form to the next level integrating it into a website, we could take advantage of the sessions per visitor to map the button elements differently for each visitor. Utilizing timeout and random functions as well as increasing the number of the hidden button is a straight forward process. If we were using 100 buttons instead of 10 and generating a different valid button for each page loading the form will be safe enough from submission by bots.

One important note here, do not make the mistake to use the form detection process described here, as the means to identify bots and thereafter use the info like IPs, to block access or ban users. Using this concept as a honeypot may impact your traffic and bring the opposite effects. The reason is, once a honeypot is identified an attacker can lead legitimate traffic including search engines into the honeypot thus ruining your traffic.

Taking into consideration server programming languages like PHP now, the CSS and HTML can be generated on the fly and tracked thereafter. The session can be used as the feed to a random generator and can expose various CSS classes to cloud any attempt from bots to parse the stylesheet files. Examples include ordering and repeating the biz# classes or repeating the display attribute of the class with none, visible or vice versa.

The anti-bot verification system employs all of the techniques described here. The module can deny form submission from all bots known.

Review: Bot Detection Techniques

Please enter your comment for this technical article based on your experience.

NOTE: HTML is not translated. Reviews are manually approved by the administrator.

Tags supported for code presentation purposes:
- For PHP enclose the code in [code1][/code1] tags
- For MySQL queries use [code3][/code3] tags
- For HTML content use [code5][/code5] tags
- For CSS use [code6][/code6] tags
 
Your Email (Will not be published):
Your Name:
Your Comments:

Blog and News

The CheetahMail Spam Internal Links Structure Blind Redirects and Exploits
 
 
SEO-G Top eCommerce SEO Manager generatic 100% custom and static URLs
 
 
 
I-Metrics Layer by Asymmetric Software
E-Commerce Engine Copyright © 2003 osCommerce (MS2.2)
Copyright © 2003-2012 Asymmetric Software - All rights reserved.
 
 
  Advanced Search
E-Commerce by Asymmetric Software - Innovation and Excellence
  • LOG IN
  • CREATE ACCOUNT
SEO-G Supreme URLs Generator for e-commerce stores