Table of Contents

**What is Gini index?**

Gini index is the most commonly used measure of inequality. Also referred as Gini ratio or Gini coefficient.

Gini index for binary variables is calculated in the example below.

Student | inHostel | Target Class |

Yes | True | Yes |

Yes | True | Yes |

Yes | False | No |

False | False | Yes |

False | True | No |

False | True | No |

False | False | No |

True | False | Yes |

False | True | No |

Now we will calculate Gini index of student and inHostel.

**Step 1:**

Gini(X) = 1 – [(4/9)^{2 }+ (5/9)^{2}] = 40/81

**Step 2:**

Gini(Student = False) = 1 – [(1/5)^{2 }+ (4/5)^{2}] = 8/25

Gini(Student = True) = 1 – [(3/4)^{2 }+ (1/4)^{2}] = 3/8

GiniGain(Student) = Gini(X) – [4/9· Gini(Student = True) + 5/9· Gini(Student = False)] = 0.149

**Step 3:**

Gini(inHostel = False) = 1 – [(2/4)^{2 }+ (2/4)^{2}] = 1/2

Gini(inHostel = True) = 1 – [(2/5)^{2 }+ (3/5)^{2}] = 12/25

GiniGain(inHostel) = Gini(X) – [5/9· Gini(inHostel = True) + 4/9· Gini(inHostel = False)] = 0.005

**Results**

Best split point is Student because it has high gini gain.

## Next Similar Tutorials

- Decision tree induction on categorical attributes – Click Here
- Decision Tree Induction and Entropy in data mining – Click Here
- Overfitting of decision tree and tree pruning – Click Here
- Attribute selection Measures – Click Here
- Computing Information-Gain for Continuous-Valued Attributes in data mining – Click Here
- Gini index for binary variables – Click Here
- Bagging and Bootstrap in Data Mining, Machine Learning – Click Here
- Evaluation of a classifier by confusion matrix in data mining – Click Here
- Holdout method for evaluating a classifier in data mining – Click Here
- RainForest Algorithm / Framework – Click Here
- Boosting in data mining – Click Here
- Naive Bayes Classifier – Click Here