What is the purpose of the above code?
Ans: Here we have dropped the original numerical values so that we can put our scaled values in the data frame. So in the next line, we are merging our scaled values (which are monthly charges, total charges and tenure)so that we can get a standard format of all our numerical column and within a range of 0 to 1.
Q. What do you observe?
Ans: The correlation matrix above shows the correlation coefficients between several variables related to churn:
Below are few insights needs to be considered:
Model Building (We will build Decision Tree and Logistics Regression models)
Q. What is the purpose of random_state parameter?
Ans: We use random_state parameter so that we could reuse the train_test_spilt data so that we can reproduce our results to get best accuracy.
Logistics Regression
Q. What do the scores mean? Is this a good model fit based on the scores. Make sure you print all the scores.
Scores are between 0 and 1, with a larger score indicating a better fit.
We can calculate scores in 4 different ways:
Accuracy is the most logical performance metric, and it is just the proportion of properly predicted observations to all observations.
Precision is the ratio of correctly predicted positive observations to the total predicted positive observations.
Recall is the ratio of correctly predicted positive observations to the all observations in actual class
F1 Score is the weighted average of Precision and Recall. Therefore, this score takes both false positives and false negatives into account
Decision Tree
Q. What do the scores mean? Is this a good model fit based on the scores. Make sure you print all the scores.
Scores are between 0 and 1, with a larger score indicating a better fit.
We can calculate scores in 4 different ways:
Accuracy is the most logical performance metric, and it is just the proportion of properly predicted observations to all observations.
Precision is the ratio of correctly predicted positive observations to the total predicted positive observations.
Recall is the ratio of correctly predicted positive observations to the all observations in actual class
F1 Score is the weighted average of Precision and Recall. Therefore, this score takes both false positives and false negatives into account
Q Which model performs better? (Hint: compare the metrics)
Ans: After comparing different models, we have observed that Logistic regression has better accuracy which is (0.80) as compared to decission tree model, this defines that the Logistic regression performs better.
K- fold Cross Validation
Q. What is K-fold cross validation?
Ans: The process contains a single parameter, k, that designates how many groups should be created from a given data sample. As a result, the process is frequently referred to as k-fold cross-validation. When a particular number for k is selected, it may be substituted for k in the model's reference, such as when k=10 is used to refer to cross-validation by a 10-fold factor.
Q. What do accuracies tell?
Ans: Accuracy is the measure of how closely a measurement resembles the actual value.
Feature Selection/Feature Engineering
Q. Has the model improved after feature selection?
Ans: As per the above confusion matrix ,we can see there is not much difference between before and after selection of features .Before feature selection the accuracy was 0.73 and now it is 0.79, However, we are getting accuracy by using limited features.
Meaning, we are getting better accuracy with less features so that we can ignore other features.
This chart is empty
Chart was probably not set up properly in the notebook
Q. Print the final Results
Q. Provide recommendations based on the feature selection. What should company target for to reduce churn?
Customer churn has a negative impact on a company's profitability. There are numerous tactics that can be used to reduce client churn. Knowing a company's customers well is the best strategy to prevent customer churn. This entails identifying clients who run the danger of leaving and making an effort to increase their contentment. Naturally, the primary priority for solving this problem is to improve customer service.
In order to do that, for example, organization could start loyalty program for senior citizens with some benefits so that they sould not leave the company. Another tactic would be when customer joins the organization for getting the service, company should give some additional beneficial services at the beginning itself to prevent early churning.