Naive Bayes Classifier for Text Classification

akshitha singareddy
Nov 14, 2022
5 min read

Updated: Nov 27, 2022

Objective: The main objective of this blog is to demonstrate and build a Naïve Bayes Classifier(NBC) on text data without using any external modules i.e., to build from scratch. We are given Ford Sentence Classification Dataset[1] (Kaggle link — Ford Sentence Classification ). This dataset consists corpus of text data related to 6 different categories. They are Responsibility, Requirement, Soft Skill, Experience, Skill and Education. By using Naïve Bayes Classifier, given a new sentence we are supposed to find out which type this sentence belongs to using bayes theorem. Let us see what exactly is a Naïve Bayes Classifier and how I built to predict the text without using any external modules. Introduction: Natural Language processing is the ability of the computer to understand human languages and interpret them. Text Classification is one such application of NLP where the algorithm classifies the given text into a class based on words. In recent times, NLP has been one of the trendiest technology to learn. In this blog, we are going to build a text classifier using a very simple yet powerful algorithm called Naïve Bayes. Above mentioned dataset [1] has around 60k training samples and 15k test samples . The dataset is a part of the Ford Business Hiring Challenge. What is Text Classification?

Text Classification is a machine learning technique where in develop a model which assigns the text a label to which it belongs to. One good example of text classification is sentimental analysis. In sentimental analysis, we are supposed to tag each text either a positive or negative. Text Classification has many applications in real life. Twitter uses sentimental analysis to flag inappropriate tweets using text classification techniques. Most of the real word text is in unstructured format. It takes a lot of efforts to clean and preprocess the text.

Naïve Bayes Classifier: Naïve Bayes is an supervised learning algorithm which is based on the bayes theorem. Algorithm assumes features in the dataset are independent to each other. It is a simple yet powerful algorithm that can be used to text classification tasks. The algorithm can also be implemented without using any external libraries. To understand the Naïve Bayes theorem, we need to understand the concept of conditional probability. Bayes theorem in probability states that current probability of an event can be estimated using the past data related to that event. Bayes theorem is stated as follows: Assume X is the data with features X1,X2,…..Xn. To calculate the probability that X belong to class Y, using bayes theorem, we use the below equation:

where P(Y | X1, X2, …, Xd) is the probability that given X data belongs to class Y. This is also called as Posterior Probability in statistics. P(X1, X2, …, Xd | Y) P(Y) is the probability of occurring X when Y is true. This is also called as Likelihood. P(Y) is the probability that Y occurs. This is also called as Prior Probability. P(X1,X2, …, Xd) is the probability that X occurs. This is also called as Marginal Probability.

Implementation: As the algorithm is easy to implement, no external modules are required to implement the algorithm. Below are the steps followed to implement this assignment. Step -1 : Loading the data and pre-processing As the data is the CSV format, pandas is used to read the data frame. There are two csv files in the Kaggle namely train_data and test_data out of which test_data has no labels in it. Once data is loaded into the python environment, data frame is being split into train and test sets in the ratio of 80 and 30 respectively. The dataset has also null values in it. I have dropped all null values using pandas before splitting the dataset.

Step -2 : Calculating Probabilities

The core idea of Bayes theorem is to find Y that maximizes P(X1, X2, …, Xd|Y)*P(Y). So, we will build dictionaries for probability of Y and each word in each class. We will start by calculating prior. Below code calculates and prints the prior to each class available in the dataset.

Now, we will build a nested dictionary to store conditional probabilities for each class. The dictionary is built in such a way that keys are classes and values are dictionaries with keys as words and values as count of that particular word in the class. Below is the code that demonstrates building dictionary.

Once dictionary is built, Likelihood of all words is also calculated which is probability of that word occurring given a class type.

Above function calculate_likelihods takes three parameters namely train_df, prior and alpha. Here alpha is the smoothening parameter which we will learn more about in the coming sections. As now we had likelihoods and priors, it’s time to predict the sentence which type it belongs to using past data. Below is the function that predicts the type of a class given a sentence and optionally alpha.

Now, it’s time to predict. Let us run predict function on every sentence in the test dataset using apply function in pandas library. Once predictions are done, accuracy is calculated and printed.

The Accuracy of the Naïve Bayes Classifier without smoothening has given an accuracy of 54.3%. Experiments: Once the pipeline is built, following experiments were done. 1. Compared the effect of Laplace smoothing. Laplace smoothing is a smoothing technique which handles zero error in calculating probabilities. Consider a case, where you need to classify the text into a class but the word does not exists in likelihood table. Some of the basic approaches is to return either zero or ignore the term. But these could be decremental in nature. So, to avoid these problems, smoothing is done while calculating the probabilities.

Above is the formula to apply Laplace smoothing. where alpha is the smoothing parameter K is the number of features available in the dataset.

I have experimented with alpha values 0,1,5,10,50,100 and 1000. From my experimentation, it is found out that the best alpha value is 1. Beyond 1, as the alpha value increases accuracyis decreasing. Below is the plot that demonstrates accuracy vs alpha on test dataset.

2. Derived top 10 words that predicts each class Using likelihoods, I have derived top 10 words for each class. Below are the code and results to used to derive.

3. Effect of data cleaning using NLP techniques like regextokenizer, stopwords removal.

I have tried applying techniques like stop words removal, transforming all words to lower case and filtering tokens using regular expressions. Code to implement this are commented out in various sections above. My Contributions: Below are list of my contributions for building an text classifier using Naïve Bayes algorithm. 1. I have followed blogs[2] , [3] and [4] to understand the naïve bayes and Laplace smoothing. 2. Learnt bayes theorem and how to implement NBC from scratchwithout usage of any external libraries. 3. Deep dived into conditional probability concepts 4. Encountered some challenges while implementing NBC without any modules but by thorough understanding of concepts, I could able to solve them again. 5. Understood the applications of text classification in real life scenarios. 6. Have experimented with various alpha values in finding the best smoothing factor 7. Learnt how to preprocess text data using various inbuilt functions and external modules in python.

Challenges Encountered and Resolutions: While solving the assignment, I encountered few challenges. Below are challenges and how I resolved them. 1. The biggestchallenge I encountered is to formulatethe algorithm withoutany libraries efficiently. To resolve this, I have experimented various ways and analyzed how much each implementation is taken. I have used nested dictionaries to efficiently store the data. I could also reduce lines of code by using python functions like get in dictionaries.

2. I also found it little difficult to understand NBC algorithm.

By going through lecture notes and some online blogs, I could get a clear understanding of the algorithm easily which eventually helped me in solving this assignment.