Gini index for binary variables in data mining

What is Gini index?

Gini index is the most commonly used measure of inequality. Also referred as Gini ratio or Gini coefficient.

Gini index for binary variables is calculated in the example below.

StudentinHostelTarget Class
YesTrueYes
YesTrueYes
YesFalseNo
FalseFalseYes
FalseTrueNo
FalseTrueNo
FalseFalseNo
TrueFalseYes
FalseTrueNo

Now we will calculate Gini index of student and inHostel.

Step 1:

Gini(X) = 1 [(4/9)2 + (5/9)2] = 40/81

Step 2:

Gini(Student = False) = 1 [(1/5)2 + (4/5)2] = 8/25

Gini(Student = True) = 1 [(3/4)2 + (1/4)2] = 3/8

GiniGain(Student) = Gini(X) [4/9· Gini(Student = True) + 5/9· Gini(Student = False)] = 0.149

Step 3:

Gini(inHostel = False) = 1 [(2/4)2 + (2/4)2] = 1/2 

Gini(inHostel = True) = 1 [(2/5)2 + (3/5)2] = 12/25

GiniGain(inHostel) = Gini(X[5/9· Gini(inHostel = True) + 4/9· Gini(inHostel = False)] = 0.005

Results

Best split point is Student because it has high gini gain.

Next Similar Tutorials

  1. Decision tree induction on categorical attributes  – Click Here
  2. Decision Tree Induction and Entropy in data mining – Click Here
  3. Overfitting of decision tree and tree pruning – Click Here
  4. Attribute selection Measures – Click Here
  5. Computing Information-Gain for Continuous-Valued Attributes in data mining – Click Here
  6. Gini index for binary variables – Click Here
  7. Bagging and Bootstrap in Data Mining, Machine Learning – Click Here
  8. Evaluation of a classifier by confusion matrix in data mining – Click Here
  9. Holdout method for evaluating a classifier in data mining – Click Here
  10. RainForest Algorithm / Framework – Click Here
  11. Boosting in data mining – Click Here
  12. Naive Bayes Classifier  – Click Here

 

Subscribe for Friendship

Latest posts by Prof. Fazal Rehman Shamil (see all)

Buy advertisement space on T4Tutorials

For more details email [email protected]