Codementor Events

Decision Tree: Knowing The Every Possible Output

Published Dec 26, 2019Last updated Dec 27, 2019

Decision Tree is the best and easiest way to analyze the consequences of each possible output, be it in data mining, statistics, or machine learning. It is a supervised learning approach that can be used for both classification and regression.

A decision tree can help in visually represent the decisions and the explicit decision making process. For example: While developing a decision tree, at each node there is a different type of question asked. Based on the type of question, you can calculate output from it.

1.png

**So, a tree can be defined as - **

  1. The root at the top
  2. Conditions that are the internal nodes
  3. The end of the branch that doesn't split being the decision or the leaf

A real dataset will have a lot of conditions and a lot more decision sets. A bigger branch set, a lot bigger tree, but the simple flow cannot be ignored by a decision tree. This is why it is the most chosen approach in machine learning.
Building a tree means choosing the features and conditions to use for splitting, with the decision to know when to stop. It can help solve both classification and regression problems.

Approaches to Build a Decision Tree

Based on the internal nodules or the conditions, we can help build a tree to ask different types of questions.

1. Gini Impurity Approach
Gini Impurity definesthe incorrect classification of a data. If a dataset is Pure (belonging to the same class) then the incorrect classification is 0. If the dataset is a mixture of different classes then the incorrectness will be high.

Steps to create a decision tree

  • List all the datasets for making a decision tree.
  • Calculate all the uncertainties or how much the data is mixed.
  • List all the questions that you need to ask.
  • Separate the rows into True and False.
  • Determine the information based on the approach of incorrectness.
  • Update the highest information
  • Divide the nodes according to the highest information.

Gini Impurity Formula
J
G(k) = Σ P(i) * (1 - P(i))
i=1

Example:

Take example of a dataset with no mixing 
no_mixing = [['Tiger'],
                 ['Tiger']]
This will give output = 0

some_mixing = [['Tiger'],
                   ['Elephant']]
This will give output = 0.5

lots_of_mixing = [['Tiger'],
                      ['Elephant'],
                      ['Giraffe'],
                      ['Grapefruit'],
                      ['Rhino]]
This will give output = 0.8

2. Information Gain Approach
Information Gain approach is used when there is a need to decide which feature to split at each step in the building tree. This approach is used when you want to keep the tree small. At each step, you choose to split the result in the purest daughter nodes. A commonly used type of data that is pure is called information. For every information in the building tree, the information feature gives us the class. The split that consists of the highest information will be considered as the first split and the process will progress until all the leaf nodes are pure or the information becomes a 0.

The Equation of Information gain:

Information gain = entropy(parent) - [weight average] * entropy(children)

Entropy helps control how a decision tree splits the data. This actually affects the boundaries of a decision tree.

Advantages of Creating a Decision tree model for your approach

  1. Decision trees are easy to understand and create
  2. Can handle any type of data, be it, numerical or categorical
  3. Requires very little data preprocessing
  4. does not require normalization of data
  5. does not require scaling of data as well.
  6. Missing values do not affect the model

Some Disadvantages of Including Decision tree in your approach

  1. Overfitting
  2. Requires some kind of measurement
  3. Parameter Tuning
  4. Can create biased learned trees
  5. A small change in the information can cause a lot of change
  6. Higher time required to initiate changes
  7. Inadequate for applying regression and predicting continuous values.

A decision tree is the best predictive model to make a quantitative analysis of business problems. It helps to easily validate results by naturally classifying the problems and with modification handle all the regression problems.

Some of the Common Applications of Decision Tree

1. Engineering
The most important domain for decision tree is engineering, it has been widely used in energy consumption, faulty diagnosis, and healthcare management. Even though there are several methods that can analyse energy consumption, but the decision tree is the most prefered method. This is method is most opted for as the hierarchical structure gives out a useful representation of deep level of insight and information.
Another application in the engineering domain is finding faults, especially in rotatory machines. The detection involves the measurement of a number of variables that these can be easily evaluated via a decision tree structure.

2. Business Management
Decision tree is a great way to extract useful information from a database that can be further used to enhance customer service. These have been employed in many applications in the niche of business and management. Decision tree modelling has been increasingly used in customer management and fraud detection. For instance, there is a VPN for mac and you are migrating database then these data facets needs to be analyzed by the decision tree to determine which information is correct to be migrated through the VPN services.
Analyzing a large database can be done by collecting an individual’s data and then providing them recommendation from the extracted data. The developed decision tree can also suggest the customers’ products that they would like to purchase depending on their previous purchases.

3.Fraud Detection
Decision trees’ fraud detection method is a way of detection of fraudulent statement through statistical methods. This approach is a good way to address fraudulent issues as it considers all the variables during the modelling process. A lot of previous research work has found that that decision trees can make a significant contribution to the detection due to a highly accurate rate.

The decision tree analysis helps in enhancing the decision-making capabilities of commercial banks by assigning them success and failure probability on the provided database. It also helps identify borrowers who do not meet the minimum-standard criteria and in the future also are less likely to meet all minimum requirements.

To Conclude:
Decision trees provide an approach that helps in quantifying the values and probability of each possible outcome of a decision, allowing decision makers to make educated choices among the various alternatives.

Discover and read more posts from Ashok Sharma
get started