During the work for our fraud DB project we were stuck at the point of identifying users without using their IP address due to privacy issues.But there are several other reasons not to rely on the IP solely. What happens if someone uses a VPN or proxy service? What about public hotspots (which are increasing)? Therefore we concluded that we needed a different approach which focuses on the device, not the IP.
What is a (web) browser?
A browser is a software application which is used to locate, retrieve and display content. In the client/server model, the browser is the client which is run on a computer. It contacts the server and requests information. The server sends the information back to the browser, which displays the results on the computer.
We discovered that browsers are very unique. The adjusted profile of plugins (like AdBlock, Norton or Firebug), fonts, languages, Add-Ons etc. forms a highly unique fingerprint. In fact, you can be 90% sure that you've identified one exact browser.
Crazy, who would need that fancy technique?
User identification is a valuable technique for many companies from the affiliate industry to market research and even for intelligence agencies. Just to make that clear, the affiliate industry, market research, IT Security companies, our dear friend the intelligence agencies and so on.
In addition, browser fingerprinting may be the first step towards overcoming cookies and the accompanied risk of session hijacking. Until now, cookies were necessary to overcome the limitations of the stateless protocol (HTTP). To conclude, browser fingerprinting would be a significant gain for IT security.
What is up with the technical implementation?
To examine the technical implementation, take a look at the thesis of Henning Tillmann. A basic understanding is provided in the following section:
The principle is very simple, grab all the information you are able to collect about the browser and use a algorithm to combine them. PHP already provides us with interesting functions like
apache_request_headers, which fetches all HTTP request headers from the current request and returns an associative array. If you don't use PHP as a module of apache2, you can use the following function which does the same:
<?php
if( !function_exists('apache_request_headers') ) {
function apache_request_headers() {
$arh = array();
$rx_http = '/\AHTTP_/';
foreach($_SERVER as $key => $val) {
if( preg_match($rx_http, $key) ) {
$arh_key = preg_replace($rx_http, '', $key);
$rx_matches = array();
// do some nasty string manipulations to restore the original letter case
// this should work in most cases
$rx_matches = explode('_', $arh_key);
if( count($rx_matches) > 0 and strlen($arh_key) > 2 ) {
foreach($rx_matches as $ak_key => $ak_val) $rx_matches[$ak_key] = ucfirst($ak_val);
$arh_key = implode('-', $rx_matches);
}
$arh[$arh_key] = $val;
}
}
return( $arh );
}
}
?>
(credits to limalopex.eisfux.de)
Does the browser fingerprint change over time?
Yes, indeed. In more than half of the cases, the fingerprint has actually changed. But through the use of predictable behavioral analysis, we could design an algorithm which calculates possible changes. Users may install or uninstall plugins, or new software. Maybe the time settings changes or the location. We predict all these changes up to a specific level of abstraction in order to improve our set of results.
Current evaluation of the project
Here, you can find the current statistics for this project. The analysis contains the amount of fingerprints, the amount of multiple or duplicate records, percentage, accuracy and so on.. just have a look:
http://frauddb.framsteg.de/fingerprinting/statistics.php
if you want to participate, just go ahead and visit: