Last updated on:December 9th, 2018,

Gini index for binary variables in data mining

What is Gini index?

Gini index is the most commonly used measure of inequality. Also referred as Gini ratio or Gini coefficient.

Gini index for binary variables is calculated in the example below.

StudentinHostelTarget Class
YesTrueYes
YesTrueYes
YesFalseNo
FalseFalseYes
FalseTrueNo
FalseTrueNo
FalseFalseNo
TrueFalseYes
FalseTrueNo

Now we will calculate Gini index of student and inHostel.

Step 1:

Gini(X) = 1 [(4/9)2 + (5/9)2] = 40/81

Step 2:

Gini(Student = False) = 1 [(1/5)2 + (4/5)2] = 8/25

Gini(Student = True) = 1 [(3/4)2 + (1/4)2] = 3/8

GiniGain(Student) = Gini(X) [4/9· Gini(Student = True) + 5/9· Gini(Student = False)] = 0.149

Step 3:

Gini(inHostel = False) = 1 [(2/4)2 + (2/4)2] = 1/2 

Gini(inHostel = True) = 1 [(2/5)2 + (3/5)2] = 12/25

GiniGain(inHostel) = Gini(X[5/9· Gini(inHostel = True) + 4/9· Gini(inHostel = False)] = 0.005

Results

Best split point is Student because it has high gini gain.

 

0Shares

Leave a Reply

Your email address will not be published.