Gini index for binary variables in data mining

What is Gini index?

Gini index is the most commonly used measure of inequality. Also referred as Gini ratio or Gini coefficient.

Gini index for binary variables is calculated in the example below.

StudentinHostelTarget Class
YesTrueYes
YesTrueYes
YesFalseNo
FalseFalseYes
FalseTrueNo
FalseTrueNo
FalseFalseNo
TrueFalseYes
FalseTrueNo

Now we will calculate Gini index of student and inHostel.

Step 1:

Gini(X) = 1 [(4/9)2 + (5/9)2] = 40/81

Step 2:

Gini(Student = False) = 1 [(1/5)2 + (4/5)2] = 8/25

Gini(Student = True) = 1 [(3/4)2 + (1/4)2] = 3/8

GiniGain(Student) = Gini(X) [4/9· Gini(Student = True) + 5/9· Gini(Student = False)] = 0.149

Step 3:

Gini(inHostel = False) = 1 [(2/4)2 + (2/4)2] = 1/2 

Gini(inHostel = True) = 1 [(2/5)2 + (3/5)2] = 12/25

GiniGain(inHostel) = Gini(X[5/9· Gini(inHostel = True) + 4/9· Gini(inHostel = False)] = 0.005

Results

Best split point is Student because it has high gini gain.

 

Fazal Rehman Shamil Click Here to Know More
Instructor, Researcher, Blogger, SEO Expert, Poet and Publisher of International Journal Of Software, Technology & Science ISSN : 2616 - 5325
Dear Professors and Resarchers!You are welome to Cite these tutorials in your research or slides etc. Please don't forget to mention the reference of website. Copy Paste of text is strcitly forbidden. Images can be reuse because images are protected with watermark.

Leave a Reply

Your email address will not be published. Required fields are marked *