Gini index for binary variables in data mining

What is Gini index?

Gini index is the most commonly used measure of inequality. Also referred as Gini ratio or Gini coefficient.

Gini index for binary variables is calculated in the example below.

Student inHostel Target Class
Yes True Yes
Yes True Yes
Yes False No
False False Yes
False True No
False True No
False False No
True False Yes
False True No

Now we will calculate Gini index of student and inHostel.

Step 1:

Gini(X) = 1 [(4/9)2 + (5/9)2] = 40/81

Step 2:

Gini(Student = False) = 1 [(1/5)2 + (4/5)2] = 8/25

Gini(Student = True) = 1 [(3/4)2 + (1/4)2] = 3/8

GiniGain(Student) = Gini(X) [4/9· Gini(Student = True) + 5/9· Gini(Student = False)] = 0.149

Step 3:

Gini(inHostel = False) = 1 [(2/4)2 + (2/4)2] = 1/2 

Gini(inHostel = True) = 1 [(2/5)2 + (3/5)2] = 12/25

GiniGain(inHostel) = Gini(X[5/9· Gini(inHostel = True) + 4/9· Gini(inHostel = False)] = 0.005

Results

Best split point is Student because it has high gini gain.

 

Fazal Rehman Shamil
Welcome to all friends. The reason for our success is only your love for T4Tutorials. Our team is always available to answer your queries regarding any kind of confusions or discussion regarding your study and career matters. For discussion with us please join our facebook group "T4Tutorials.com". The link of the group is mentioned below. Thanks and love to all for connecting with us. We are nothing without you. Love you all.....
https://web.facebook.com/groups/2066136233601097/

Leave a Reply