What is process mining?
Process mining is a technique in the field of process management which allows user to analyse business processes based on behavior and event logs. Basically the original idea is to extract knowledge from IT systems in order to visualize them on the meta level. In most cases these knowledge is represented through logs. With the usage of discovering and conformance metrics we are able to gain and verify our results.
Why is this so important?
Imagine the following situation. Your company has a high response time or the production time of a product or service takes very much unexpected time. The reason is not necessary that our employees are not good or skills, rather a typically reason is that there are weak points in the value chain within the business process. Like the situation that a department can't handle to amount of work due to missing or inexperienced personnel.
If you model the value chain of the german administration system in citizen center one would get an sad result to due the budgets cuts within the last years. Therefor it takes more 3 weeks to get an appointment for requesting a passport and another 3 month to receive it. Another example would be the production of a car within factory road. We all know that the most weak point will determine the speed of production. The process mining part will find the causes for that.
We already discussed a similar topic within the area of security analysis of organized IT crime.
How to mine processes?
There are several ways and attempts in order to extract knowledge from event logs. One of the most common is the Alpha Algorithm with its improvements Alpha(+/++):
“The Alpha(+/++) Algorithm aims at reconstructing causality from a set of sequences of events. It was first put forward by van der Aalst, Weijters and Măruşter. Several extensions or modifications of it have since been presented, which will be listed below. Within the concept of the algorithm one takes a workflow log as input and results in a workflow net being constructed. It does so by examining causal relationships observed between tasks. For example, one specific task might always precede another specific task in every execution trace, which would be useful information.”
Another attempt is the so called social miner. It aims at a different target group but is also very interesting:
“The Social mining or rather the analysis of social interconnected relationships is one of the most interesting topics of today's world. When deriving roles and other organizational entities from the event log the focus is on the relation between people or groups of people and the process. Another perspective is not to focus on the relation between the process and individuals but on relations among individuals (or groups of individuals). ”
Facebook uses it, even the NSA is using social miner in order to understand the connections between people. But they have a weak point.
Whats the matter about it?
As we all know logs are not perfect. They include a lot of so called “noise”. Noise could be wrong or incorrect log entries which distort the correct results with wrong data. The real issue is to detect these noise entries.
There are several attempts in order to handle noise:
-
“The main motivation of the genetic algorithms (Eiben and Smith 2003) is to benefit from the global search that is performed by this kind of algorithms. Genetic algorithms are adaptive search methods that try to mimic the process of evolution. These algorithms start with an initial population of individuals. Every individual is assigned a fitness measure to indicate its quality. In our case, an individual is a possible process model and the fitness is a function that evaluates how well the individual is able to reproduce the behavior in the log. Populations evolve by selecting the fittest individuals and generating new individuals using genetic operators such as crossover (combining parts of two or more individuals) and mutation (random modification of an individual). “
-
“The Heuristic Miner extends the alpha algorithm by consider the frequency of traces in the log. Heuristics miner can deal with noise, and can be used to express the main behavior. The Heuristics Miner Plugin mines the control flow perspective of a process model. To do so, it only considers the order of the events within a case. In other words, the order of events among cases isn't important. For instance for the log in the log file only the fields case id, time stamp and activity are considered during the mining. The timestamp of an activity is used to calculate these orderings.”
So what is the problem?
The biggest advantage of the heuristic algorithm is also its main problem. The threshold. By increasing the threshold we are able to remove instances with a low frequence. But we have to watch out because the threshold applies to the entire net and not single edges within it. Therefor, there is always the possibility to remove process relevant information by increasing the threshold and we try to handle this failure.
Let me make that clear with a small example. We have the log containing the following entries:
Normally with the attempt of the heuristic miner, we would increase the threshold up to 3 in order to get rid of our assumed noise. For final safety reasons we always have to interview a domain expert. That not our goal, therefore we have to think about something else.
The workflow above is the original process represented in our log. So if we would have increased our threshold up to the value of 3, our main failover plan doesn't work anymore. And this issue addresses all relating processes with backup and failover technology because a backup or failover should only appear in 1 of 1000000000 cases in our event log. Because its a FAIL over and not the average case.
Enclosed a detailed explanation.
Just imagine we increase the threshold and kick out failover instances within the process. That would be the state of emergency.
So, how can we handle this disadvantage?
Basically we thought of a two step improvement of the heuristic miner.
The Preprocessing Stage indicates the major part in order to find the sibling model of our research instance. Here we are comparing the log against all logs in our archive with algorithms close to the predictable behavioral analysis group. But instead of comparing behaviors we take a look at the activities in order to find relative ones. If we have a match, we will mark the congruent model.
During the Postprocessing Stage we are able to compare the results of our heuristic miner with the results of the preprocessing stage. If we increase the threshold we can compare in time against the congruent model if process relevant activities get kicked out. Therefore we are able to increase the threshold without loosing process relevant activities.
We developed this technique as SaaS. We you want to know more about it, just step over to our project site:
http://process-mining.framsteg.de
Thanks to your partners who helped us to create a comparable archive with sample processes for the comparison part.