# Correlation analysis of Nominal data with Chi-Square Test in Data Mining

## Chi-Square Test

This analysis can be done by the chi-square test.A chi-square test is the test to analyze the correlation of nominal data.

## Correlation VS Causality:

Correlation does not always tell us about causality.

Example:

• The number of students passed in exam and number of car theft in a country is correlated with each other but maybe it does not mean that the number of students passed effects car theft in a country.

But in some cases it may be;

• The number of students passed in the exam and the number of students who live near to the university is correlated with each other and maybe a number of students who live near to the university can be a cause of the student result.

 Passed student Not passed student Sum Live near University Observed=140 Expected = 180*330/1320 Expected =45 Observed=190 Expected = 1140*330/1320 Expected =285 330 Not live near University Observed=40 Expected = 180*990/1320 Expected =135 Observed=950 Expected = 1140*990/1320 Expected =855 990 Sum 140 + 40 = 180 190 + 950 = 1140 1320

## Degrees of freedom:

DF = (r – 1) * (c – 1)

Level of significance:

 0.01 0.05 0.1