Site icon T4Tutorials.com

Naive Bayes classifier in data mining

Naïve Bayes is a probabilistic machine learning algorithm based on Bayes’ Theorem.
It is called “naïve” because it assumes that all features (input variables) are independent of each other — which is rarely true in real life, but this assumption still works surprisingly well in many practical cases.

Bayes’ Theorem

P(A∣B)=P(B∣A)×P(A)P(B)P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}

Where:

 Working of Naïve Bayes Classifier

  1. Calculate prior probabilities for each class.

  2. Calculate likelihood (probability of each feature given a class).

  3. Apply Bayes’ theorem to find the posterior probability for each class.

  4. The class with the highest posterior probability is the predicted class.


Example — Email Spam Classification

Let’s say we want to classify whether an email is “Spam” or “Not Spam” using the presence of the word “Offer”.

Step 1: Given Data

Email Word “Offer” Present Class
1 Yes Spam
2 Yes Spam
3 No Not Spam
4 Yes Not Spam
5 No Not Spam

Step 2: Calculate Priors

P(Spam)=25=0.4P(Spam) = \frac{2}{5} = 0.4 P(NotSpam)=35=0.6P(NotSpam) = \frac{3}{5} = 0.6


Step 3: Calculate Likelihoods

P(Offer=Yes∣Spam)=22=1P(Offer = Yes | Spam) = \frac{2}{2} = 1 P(Offer=Yes∣NotSpam)=13≈0.33P(Offer = Yes | NotSpam) = \frac{1}{3} \approx 0.33


Step 4: Apply Bayes’ Theorem

We want to find whether an email with “Offer” is spam.

P(Spam∣Offer=Yes)=P(Offer=Yes∣Spam)×P(Spam)P(Offer=Yes)P(Spam | Offer = Yes) = \frac{P(Offer = Yes | Spam) \times P(Spam)}{P(Offer = Yes)}

But since we only compare probabilities, we can ignore P(Offer=Yes)P(Offer = Yes) because it’s the same for all classes.

P(Spam∣Offer=Yes)∝1×0.4=0.4P(Spam | Offer = Yes) \propto 1 \times 0.4 = 0.4 P(NotSpam∣Offer=Yes)∝0.33×0.6=0.198P(NotSpam | Offer = Yes) \propto 0.33 \times 0.6 = 0.198


 Step 5: Compare and Classify

Since 0.4>0.1980.4 > 0.198:

Email is classified as SPAM.\boxed{\text{Email is classified as SPAM.}}


Applications of Naïve Bayes

 

Next Similar Tutorials

  1. Bayesian Networks MCQs | Artificial Intelligence

  2. Decision tree induction on categorical attributes
  3. Decision Tree Induction and Entropy in data mining – Click Here
  4. Overfitting of decision tree and tree pruning – Click Here
  5. Attribute selection Measures – Click Here
  6. Computing Information-Gain for Continuous-Valued Attributes in data mining – Click Here
  7. Gini index for binary variables – Click Here
  8. Bagging and Bootstrap in Data Mining, Machine Learning – Click Here
  9. Evaluation of a classifier by confusion matrix in data mining – Click Here
  10. Holdout method for evaluating a classifier in data mining – Click Here
  11. RainForest Algorithm / Framework – Click Here
  12. Boosting in data mining – Click Here
  13. Naive Bayes Classifier  – Click Here
Exit mobile version