Big Data and Privacy

There is much discussion lately about the advantages and disadvantages of Big Data. Prof. Dirk Helbing - for example - reflects fundamental social aspects in his analysis “Big data, big impact“  of the subject. But let us approach this issue properly without spewing just buzzwords.


What is Big Data?


The term “big data” describes the collection of data records in such a  large and complex scale that it becomes difficult to process them using on-hand database management tools or traditional data processing applications. The challenges of this topic includes the process of capturing, analysis, transferring as well as storing, searching and visualizing.


In a dynamic, global economy, organizations have begun to more heavily rely on insights from their customers, internal processes and business operations in order to uncover new opportunities for growth. In the process of discovering and determining these insights, large complex sets of data are generated that then must be managed, analyzed and manipulated by skilled professionals. The compilation of this large collection of data is collectively known as “big data.”


So, how big is big data?


Most professionals in the industry consider multiple terabytes or petabytes to be the current big data benchmark. Others, however, are hesitant to commit to a specific quantity, as the rapid pace of technological development may render today’s concept of “big” as tomorrow’s “normal.” Still others will define big data relative to its context. In other words, big data is a subjective label attached to situations in which human and technical infrastructures are unable to keep pace with a company’s data needs.


(Ref: Villanova University)


What would be an example of big data?


The affiliate industry is the best example of this theme. There unimaginable amounts of information are often needed in this case to generate efficient advertising proposals. For a better understanding I will differentiate this topic:

A part of the affiliate industry works with so-called "subtle advertising" like the zanox AG and the Affilinet GmbH. They provide fixed banners, which are often perceived by users as a nuisance.


On the other side there are companies like the plista GmbH which provide rather recommendations.These recommendations are tailored to the interests and desires of the consumer. In order to realize such tailoring, they resort to scientific methods like Collaborative filtering which are based on the Predictive Behavioral Targeting Method. Now, if such algorithms are to be used in real time it is necessary to resort to a variety of information. The more information corresponds to this technique, the more precise the failure behavior analysis will be.

In order to get an impression of this topic, one is able to look into the Open Recommendation Platform of the plista GmbH. The ORP is a pretty cool feature developed by Torben Brodts engineering team where one is able to concept and implement its own algorithms. Furthermore these algorithms may be tested with valid recommendation traffic whereas the ORP returns the actual impressions of unique visitors. This should be considered one of the first real intel into such a deep and important Big Data topic. Researchers are able to collect huge amounts of anonymous data about visitors - and behavior data in order to analyze them.


How does this data analysis work?


This basic concept is trivial. Imagine you have twins. They are exactly same looking, have the same interests and same character settings. At the point where you now all about one of this twins, you also know everything about the other twin. So the interesting part is to find the twin. And this stage is equivalent to finding the fitting target-group. Based on theories like collaborative filtering and predictable behavioral targeting the system establishes a matrix with so called interest tables. It also knows sample values of each target group.


For example the target group 50+, male, employee of the government. They are most likely interested in vacation trips, tips about losing fat and troubleshooting with losing hair. No offense. I probably will have the same problems.

Another classical example would be a women between 25 and 35, which lastly bought sweet - sour products and searched for pregnancy topics. A typical recommendation would suggest baby products.

(from Wikipedia, the free encyclopedia)

So basically the systems collects the connection between the targeted user and the tables of interest in the matrix and compares them to known target groups. The best fidding group calculated with a well secured algorithm is the solution after which the recommendation is chosen.

Due to this procedure the user was put into one or several user groups in order to identify his interest and or behavior.


What could be a danger?


Like I already mentioned, Big Data provides a huge step for improving the work of researcher and personalized services. But we should also be aware of the possible side effects. These data and technologies could also be used in data mining & -profiling in order to create advanced behavioral model. As long as our government and authorities are not capable of ensuring the needed security- and awareness aspects as well as an applicable and thoroughgoing data protection, Big Data will remain a possible danger in my point of view. And we are already being confronted with this aspect like this case, or that one, or this, that, that or that shows us from the last weeks.


Whats new about this topic?

Security Issues are not a real new danger. They always appear and will be always present. Belong this point a combination of attacks increase the potential risk exponentially. With the help of tracking pixel, its possible to collect even more information about users until the point where the user is fully transparent. The average user may know that he or she should not provide so much personal details. But one is not aware of the fact that several dots could still get connected in order to form a line.


Jani Podlesny

Head of Engineering

I am focusing on Data Architecture and Analytics for Management Consulting across EMEA and the US. For my passion in Data Profiling & Privacy I am doing a PhD research at the Hasso- Plattner- Institute. 

Berlin Lab

Berlin, Germany
  032 229 340 927
  This email address is being protected from spambots. You need JavaScript enabled to view it.

Wuerzburg Lab

Würzburg, Germany
  032 229 340 927
  This email address is being protected from spambots. You need JavaScript enabled to view it.