What is Gini index?

Gini index is the most commonly used measure of inequality. Also referred as Gini ratio or Gini coefficient.

Gini index for binary variables is calculated in the example below.

Now we will calculate Gini index of student and inHostel.

Step 1:

Gini(X) = 1 – [(4/9)²+ (5/9)²] = 40/81

Step 2:

Gini(Student = False) = 1 – [(1/5)²+ (4/5)²] = 8/25

Gini(Student = True) = 1 – [(3/4)²+ (1/4)²] = 3/8

GiniGain(Student) = Gini(X) – [4/9·

Gini(Student = True) + 5/9· Gini(Student = False)] = 0.149

Step 3:

Gini(inHostel = False) = 1 – [(2/4)²+ (2/4)²] = 1/2

Gini(inHostel = True) = 1 – [(2/5)²+ (3/5)²] = 12/25

GiniGain(inHostel) = Gini(X) – [5/9· Gini(inHostel = True) + 4/9· Gini(inHostel = False)] = 0.005

Results

Best split point is Student because it has high gini gain.

Next Similar Tutorials

Decision tree induction on categorical attributes – Click Here
Decision Tree Induction and Entropy in data mining – Click Here
Overfitting of decision tree and tree pruning – Click Here
Attribute selection Measures – Click Here
Computing Information-Gain for Continuous-Valued Attributes in data mining – Click Here
Gini index for binary variables – Click Here
Bagging and Bootstrap in Data Mining, Machine Learning – Click Here
Evaluation of a classifier by confusion matrix in data mining – Click Here
Holdout method for evaluating a classifier in data mining – Click Here
RainForest Algorithm / Framework – Click Here
Boosting in data mining – Click Here
Naive Bayes Classifier – Click Here