Naive Bayes classifier in data mining

Naïve Bayes is a probabilistic machine learning algorithm based on Bayes’ Theorem.
It is called “naïve” because it assumes that all features (input variables) are independent of each other — which is rarely true in real life, but this assumption still works surprisingly well in many practical cases.

Bayes’ Theorem

P(AB)=P(BA)×P(A)P(B)P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}

Where:

  • P(AB)P(A|B): Probability of class A given feature B (posterior probability)

  • P(BA)P(B|A): Probability of feature B given class A (likelihood)

  • P(A)P(A): Probability of class A (prior probability)

  • P(B)P(B): Probability of feature B (evidence)

 Working of Naïve Bayes Classifier

  1. Calculate prior probabilities for each class.

  2. Calculate likelihood (probability of each feature given a class).

  3. Apply Bayes’ theorem to find the posterior probability for each class.

  4. The class with the highest posterior probability is the predicted class.


Example — Email Spam Classification

Let’s say we want to classify whether an email is “Spam” or “Not Spam” using the presence of the word “Offer”.

Step 1: Given Data

Email Word “Offer” Present Class
1 Yes Spam
2 Yes Spam
3 No Not Spam
4 Yes Not Spam
5 No Not Spam

Step 2: Calculate Priors

P(Spam)=25=0.4P(Spam) = \frac{2}{5} = 0.4 P(NotSpam)=35=0.6P(NotSpam) = \frac{3}{5} = 0.6


Step 3: Calculate Likelihoods

P(Offer=YesSpam)=22=1P(Offer = Yes | Spam) = \frac{2}{2} = 1 P(Offer=YesNotSpam)=130.33P(Offer = Yes | NotSpam) = \frac{1}{3} \approx 0.33


Step 4: Apply Bayes’ Theorem

We want to find whether an email with “Offer” is spam.

P(SpamOffer=Yes)=P(Offer=YesSpam)×P(Spam)P(Offer=Yes)P(Spam | Offer = Yes) = \frac{P(Offer = Yes | Spam) \times P(Spam)}{P(Offer = Yes)}

But since we only compare probabilities, we can ignore P(Offer=Yes)P(Offer = Yes) because it’s the same for all classes.

P(SpamOffer=Yes)1×0.4=0.4P(Spam | Offer = Yes) \propto 1 \times 0.4 = 0.4 P(NotSpamOffer=Yes)0.33×0.6=0.198P(NotSpam | Offer = Yes) \propto 0.33 \times 0.6 = 0.198


 Step 5: Compare and Classify

Since 0.4>0.1980.4 > 0.198:

Email is classified as SPAM.\boxed{\text{Email is classified as SPAM.}}


Applications of Naïve Bayes

  • Spam filtering (Gmail, Yahoo Mail)

  • Sentiment analysis (Positive/Negative reviews)

  • Document classification

  • Medical diagnosis

  • Weather prediction

    Where:

    • P(AB)P(A|B): Probability of class A given feature B (posterior probability)

    • P(BA)P(B|A): Probability of feature B given class A (likelihood)

    • P(A)P(A): Probability of class A (prior probability)

    • P(B)P(B): Probability of feature B (evidence)

    Working of Naïve Bayes Classifier

    1. Calculate prior probabilities for each class.

    2. Calculate likelihood (probability of each feature given a class).

    3. Apply Bayes’ theorem to find the posterior probability for each class.

    4. The class with the highest posterior probability is the predicted class.

     Example — Email Spam Classification

    Let’s say we want to classify whether an email is “Spam” or “Not Spam” using the presence of the word “Offer”.

    Step 1: Given Data

    Email Word “Offer” Present Class
    1 Yes Spam
    2 Yes Spam
    3 No Not Spam
    4 Yes Not Spam
    5 No Not Spam

    Step 2: Calculate Priors

    P(Spam)=25=0.4P(Spam) = \frac{2}{5} = 0.4 P(NotSpam)=35=0.6P(NotSpam) = \frac{3}{5} = 0.6

    Step 3: Calculate Likelihoods

    P(Offer=YesSpam)=22=1P(Offer = Yes | Spam) = \frac{2}{2} = 1 P(Offer=YesNotSpam)=130.33P(Offer = Yes | NotSpam) = \frac{1}{3} \approx 0.33

    Step 4: Apply Bayes’ Theorem

    We want to find whether an email with “Offer” is spam.

    P(SpamOffer=Yes)=P(Offer=YesSpam)×P(Spam)P(Offer=Yes)P(Spam | Offer = Yes) = \frac{P(Offer = Yes | Spam) \times P(Spam)}{P(Offer = Yes)}

    But since we only compare probabilities, we can ignore P(Offer=Yes)P(Offer = Yes) because it’s the same for all classes.

    P(SpamOffer=Yes)1×0.4=0.4P(Spam | Offer = Yes) \propto 1 \times 0.4 = 0.4 P(NotSpamOffer=Yes)0.33×0.6=0.198P(NotSpam | Offer = Yes) \propto 0.33 \times 0.6 = 0.198

    Step 5: Compare and Classify

    Since 0.4>0.1980.4 > 0.198:

    Email is classified as SPAM.\boxed{\text{Email is classified as SPAM.}}

    Applications of Naïve Bayes

    • Spam filtering (Gmail, Yahoo Mail)

    • Sentiment analysis (Positive/Negative reviews)

    • Document classification

    • Medical diagnosis

    • Weather prediction

 

Next Similar Tutorials

  1. Bayesian Networks MCQs | Artificial Intelligence

  2. Decision tree induction on categorical attributes
  3. Decision Tree Induction and Entropy in data mining – Click Here
  4. Overfitting of decision tree and tree pruning – Click Here
  5. Attribute selection Measures – Click Here
  6. Computing Information-Gain for Continuous-Valued Attributes in data mining – Click Here
  7. Gini index for binary variables – Click Here
  8. Bagging and Bootstrap in Data Mining, Machine Learning – Click Here
  9. Evaluation of a classifier by confusion matrix in data mining – Click Here
  10. Holdout method for evaluating a classifier in data mining – Click Here
  11. RainForest Algorithm / Framework – Click Here
  12. Boosting in data mining – Click Here
  13. Naive Bayes Classifier  – Click Here

Leave a Reply

Contents Copyrights Reserved By T4Tutorials