Spam detection using Machine Learning
Importing Libraries
Importing Data
Looking into data
0
ham
Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got …
1
ham
Ok lar... Joking wif u oni...
2
spam
Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive ent…
3
ham
U dun say so early hor... U c already then say...
4
ham
Nah I don't think he goes to usf, he lives around here though
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5572 entries, 0 to 5571
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 v1 5572 non-null object
1 v2 5572 non-null object
2 Unnamed: 2 50 non-null object
3 Unnamed: 3 12 non-null object
4 Unnamed: 4 6 non-null object
dtypes: object(5)
memory usage: 217.8+ KB
count
5572
5572
unique
2
5169
top
ham
Sorry, I'll call later
freq
4825
30
Data Preprocessing
Getting required columns
Changing column names
Creating 'Target' column to have numerical representation of 'Label' column
Creating 'Text_length' column
0
ham
Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got …
1
ham
Ok lar... Joking wif u oni...
2
spam
Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive ent…
3
ham
U dun say so early hor... U c already then say...
4
ham
Nah I don't think he goes to usf, he lives around here though
5
spam
FreeMsg Hey there darling it's been 3 week's now and no word back! I'd like some fun you up for it …
6
ham
Even my brother is not like to speak with me. They treat me like aids patent.
7
ham
As per your request 'Melle Melle (Oru Minnaminunginte Nurungu Vettam)' has been set as your callert…
8
spam
WINNER!! As a valued network customer you have been selected to receivea �900 prize reward! To clai…
9
spam
Had your mobile 11 months or more? U R entitled to Update to the latest colour mobiles with camera …
0
ham
Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got …
1
ham
Ok lar... Joking wif u oni...
2
spam
Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive ent…
3
ham
U dun say so early hor... U c already then say...
4
ham
Nah I don't think he goes to usf, he lives around here though
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5572 entries, 0 to 5571
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Label 5572 non-null object
1 Text_data 5572 non-null object
2 Target 5572 non-null int64
3 Text_length 5572 non-null int64
dtypes: int64(2), object(2)
memory usage: 174.2+ KB
count
5572
5572
mean
0.134063173
80.05832735
std
0.3407507549
59.62393672
min
0
2
25%
0
36
50%
0
61
75%
0
121
max
1
910
{'are', 'again', 'she', 'of', 'only', 've', "weren't", 'yourself', 'it', 'so', "she's", 'from', 'is', 'shan', 'those', 'out', 'yourselves', 'were', 'was', "mightn't", 'few', 'and', 'as', 'theirs', 'such', 'my', 'hers', 'a', 'has', 't', 'm', "isn't", 'same', 'all', 'but', 'ourselves', 'be', 'doing', "shouldn't", "don't", 'i', 'y', 'had', 'to', 'now', 'does', 'don', "you'll", 'them', 'that', 'the', 'this', 'we', 'until', 'hasn', 'will', 'having', 'when', 'up', 'against', "shan't", 'am', 'weren', 'over', "needn't", 'own', "aren't", 'mightn', 'further', 'above', 'then', 'hadn', 'no', 'being', 'couldn', 'have', 'why', "hasn't", 'under', 'here', 'your', 'below', 'herself', 'which', 'an', 'itself', 'who', 'both', "it's", "you'd", 'his', 'down', 'or', 'not', 'd', "you've", "haven't", "mustn't", 'than', 'him', 'at', 'more', 'yours', 'themselves', "won't", 'with', 'its', 'can', 'o', 'about', 'if', 'each', 'do', 'these', 'off', 'by', 'ain', 'before', 'how', "wasn't", 'just', 'what', "wouldn't", 'you', 'ours', 'himself', 'isn', 'too', 'her', 'me', 'through', 'did', 'because', 'once', 'our', 'where', 'aren', 'they', 'their', 'needn', "you're", 'didn', 'into', 'doesn', 'most', 'very', "couldn't", 'he', 'wasn', 'some', 'll', "hadn't", "doesn't", 'mustn', 'in', 'on', 'between', 'other', 's', 'wouldn', 'myself', 'haven', "should've", 'won', 'there', 'after', "that'll", 'while', 'for', 'should', 're', 'ma', 'any', 'nor', "didn't", 'shouldn', 'during', 'been', 'whom'}
Cleaning data
0
ham
Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got …
1
ham
Ok lar... Joking wif u oni...
2
spam
Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive ent…
3
ham
U dun say so early hor... U c already then say...
4
ham
Nah I don't think he goes to usf, he lives around here though
5
spam
FreeMsg Hey there darling it's been 3 week's now and no word back! I'd like some fun you up for it …
6
ham
Even my brother is not like to speak with me. They treat me like aids patent.
7
ham
As per your request 'Melle Melle (Oru Minnaminunginte Nurungu Vettam)' has been set as your callert…
8
spam
WINNER!! As a valued network customer you have been selected to receivea �900 prize reward! To clai…
9
spam
Had your mobile 11 months or more? U R entitled to Update to the latest colour mobiles with camera …
0
ham
Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got …
1
ham
Ok lar... Joking wif u oni...
2
spam
Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive ent…
3
ham
U dun say so early hor... U c already then say...
4
ham
Nah I don't think he goes to usf, he lives around here though
5
spam
FreeMsg Hey there darling it's been 3 week's now and no word back! I'd like some fun you up for it …
6
ham
Even my brother is not like to speak with me. They treat me like aids patent.
7
ham
As per your request 'Melle Melle (Oru Minnaminunginte Nurungu Vettam)' has been set as your callert…
8
spam
WINNER!! As a valued network customer you have been selected to receivea �900 prize reward! To clai…
9
spam
Had your mobile 11 months or more? U R entitled to Update to the latest colour mobiles with camera …
Visualization
/shared-libs/python3.9/py/lib/python3.9/site-packages/seaborn/_decorators.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
warnings.warn(
Top 10 common words before cleaning data
Top 10 common words after cleaning data
Dividing data so that it can be used later on if required
2
spam
Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive ent…
5
spam
FreeMsg Hey there darling it's been 3 week's now and no word back! I'd like some fun you up for it …
8
spam
WINNER!! As a valued network customer you have been selected to receivea �900 prize reward! To clai…
9
spam
Had your mobile 11 months or more? U R entitled to Update to the latest colour mobiles with camera …
11
spam
SIX chances to win CASH! From 100 to 20,000 pounds txt> CSH11 and send to 87575. Cost 150p/day, 6da…
2
spam
Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive ent…
5
spam
FreeMsg Hey there darling it's been 3 week's now and no word back! I'd like some fun you up for it …
8
spam
WINNER!! As a valued network customer you have been selected to receivea �900 prize reward! To clai…
9
spam
Had your mobile 11 months or more? U R entitled to Update to the latest colour mobiles with camera …
11
spam
SIX chances to win CASH! From 100 to 20,000 pounds txt> CSH11 and send to 87575. Cost 150p/day, 6da…
12
spam
URGENT! You have won a 1 week FREE membership in our �100,000 Prize Jackpot! Txt the word: CLAIM to…
15
spam
XXXMobileMovieClub: To use your credit, click the WAP link in the next txt message or click here>> …
19
spam
England v Macedonia - dont miss the goals/team news. Txt ur national team to 87077 eg ENGLAND to 87…
34
spam
Thanks for your subscription to Ringtone UK your mobile will be charged �5/month Please confirm by …
42
spam
07732584351 - Rodger Burns - MSG = We tried to call you re your reply to our sms for a free nokia m…
0
ham
Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got …
1
ham
Ok lar... Joking wif u oni...
3
ham
U dun say so early hor... U c already then say...
4
ham
Nah I don't think he goes to usf, he lives around here though
6
ham
Even my brother is not like to speak with me. They treat me like aids patent.
0
ham
Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got …
1
ham
Ok lar... Joking wif u oni...
3
ham
U dun say so early hor... U c already then say...
4
ham
Nah I don't think he goes to usf, he lives around here though
6
ham
Even my brother is not like to speak with me. They treat me like aids patent.
7
ham
As per your request 'Melle Melle (Oru Minnaminunginte Nurungu Vettam)' has been set as your callert…
10
ham
I'm gonna be home soon and i don't want to talk about this stuff anymore tonight, k? I've cried eno…
13
ham
I've been searching for the right words to thank you for this breather. I promise i wont take your …
14
ham
I HAVE A DATE ON SUNDAY WITH WILL!!
16
ham
Oh k...i'm watching here:)
Implementation
Training and Testing set split
Trying with multiple models
Classification Report
precision recall f1-score support
0 0.97 1.00 0.99 1196
1 1.00 0.84 0.91 197
accuracy 0.98 1393
macro avg 0.99 0.92 0.95 1393
weighted avg 0.98 0.98 0.98 1393
Accuracy 0.9777458722182341
Precision 1.0
Recall 0.8426395939086294
F1 score 0.9146005509641872
Classification Report
precision recall f1-score support
0 0.99 1.00 0.99 1196
1 0.98 0.94 0.96 197
accuracy 0.99 1393
macro avg 0.98 0.97 0.98 1393
weighted avg 0.99 0.99 0.99 1393
Accuracy 0.9885139985642498
Precision 0.9788359788359788
Recall 0.9390862944162437
F1 score 0.9585492227979275
Classification Report
precision recall f1-score support
0 0.98 1.00 0.99 1196
1 0.99 0.88 0.93 197
accuracy 0.98 1393
macro avg 0.99 0.94 0.96 1393
weighted avg 0.98 0.98 0.98 1393
Accuracy 0.9820531227566404
Precision 0.9942528735632183
Recall 0.8781725888324873
F1 score 0.9326145552560647
Classification Report
precision recall f1-score support
0 0.97 0.99 0.98 1196
1 0.94 0.83 0.88 197
accuracy 0.97 1393
macro avg 0.95 0.91 0.93 1393
weighted avg 0.97 0.97 0.97 1393
Accuracy 0.9676956209619526
Precision 0.9367816091954023
Recall 0.8274111675126904
F1 score 0.8787061994609165
Classification Report
precision recall f1-score support
0 0.97 1.00 0.99 1196
1 1.00 0.82 0.90 197
accuracy 0.97 1393
macro avg 0.99 0.91 0.94 1393
weighted avg 0.97 0.97 0.97 1393
Accuracy 0.9741564967695621
Precision 1.0
Recall 0.817258883248731
F1 score 0.899441340782123
Classification Report
precision recall f1-score support
0 0.90 1.00 0.95 1196
1 1.00 0.33 0.50 197
accuracy 0.91 1393
macro avg 0.95 0.66 0.72 1393
weighted avg 0.91 0.91 0.88 1393
Accuracy 0.905240488155061
Precision 1.0
Recall 0.3299492385786802
F1 score 0.4961832061068702
Creating a pandas data frame having scores of all above models
1
MultinomialNB
0.98
2
SVC
0.98
0
LogisticRegression
0.98
4
RandomForestClassifier
0.97
3
DecisionTreeClassifier
0.96
5
KNeighborsClassifier
0.9