Follow these steps to achieve best results. A basic knowledge of BayMiner is necessary.
Prerequisites for data:
- The data set includes information that can reveal unknown risks.
- The data set includes information about the consequences in the past.
- The data set includes projects with both positive and negative outcomes.
- The data set does not include tens of variables with only yes/no alternatives.
- The data set does not include many self-evident co-occurrences.
- A human expert judges the results to identify causalities.
Example risk factor analysis using BayMiner PRO.
- The desirable shape of the dot cloud in risk analysis is an ellipse like figure. (e.g. best projects in one end and the worst in the other end)
- To achieve an ellipse like shape the data should include several variables that contain information about strength of consequences such as profit, costs, delays, damages etc.
- Calculate the model.
- Turn the general picture in such a position that its longest shape is horizontally on the screen.
- To identify the main risks in the picture, change colours of most important variables by using the ”highlight variable” function. If you cannot identify a shift of colours that represent a risk related phenomenon continue from A. the first round of ignoring unnnecessary variables.
- Select 20 … 30 % of the outmost dots in the direction that is most interesting, e.g. unprofitable projects. Look at the profile of distributions. Most probably you will recognize some self-evident information. Name this selection using an explanatory name.
- Repeat the action for the other end of the dot cloud and name the selection.
- Compare these two selections against each other to identify unknown co-occurrences.
- In all data sets with many variables, experts always find known information. These often manifest themselves first. If you find self-evident co-occurrences ignore the variables that causes them and carry out the process again. Continue in fast cycles, to get rid of the uninteresting variables. But be patient and ignore only a small amount per trial, not to “throw the baby out with the bath water”. If you still cannot identify a shift of colours that represent a meaningful phenomenon continue to B. second ignoring of variables
- Look for variables, which profile difference indicator has a high value. These indicate stronger than average influence of the variable in question.
- Identify co-occurring variables that may reveal hidden risk.
A. First round of ignoring of unnnecessary variables.
If many of the variables have a prediction gain score of 0 % and they have a very skewed distribution, ignore them first.
If you cannot identify colour shifts that represent interesting risk related phenomenon the reason may be one of the following:
- There are not enough correlations in the data.
- You do not have enough data.
- There are too many variables.
- There are too many variables that influence each other strongly covering up the hidden information.
- There is simply not anything to find.
B. Second ignoring of variables
Study the value of the prediction gain indicator.
- If there are several variables that have a high score compared to the rest, (a value gap exists in the diminishing values list between the topmost and the majority), these topmost variables predict each other causing at the same time that these cover up the hidden relations you are looking for. Ignore these variables, or if one of them are of special concern for you, leave it and ignore the rest and re-check the situation.
- If there are still many variables that have the score 0 % and it is necessary to reduce the number of variables, ask domain experts’ advice to ignore such variables that can be safely ignored in relation to the risks you believe exist.
- You may also need to normalize values of some variables. Use project size, or depending on the problem you try to solve, any other central variable.
Looking for clusters
Any visually appealing construct in the dot cloud is potentially interesting. Those that distinguish themselves are clusters or emerging clusters. Select a sample from the middle of the cluster, try to identify a suitable comparison cluster or class and compare these against each other. Continue at step 8.
For more guidance, see our menu under “Use BayMiner” in the top of the page.