D208 - Task 1
Multiple regressions have a few assumptions that must be met:
- Linear relationships must exist between the dependent and independent variables.
- The residuals must be normally distributed, this is also known as multivariate normality.
- The independent variables must not be too correlated with each other.
The variance of errors must be relatively equal across all independent variables.
(Assumptions of multiple linear regression 2020)
While Python, R, or SAS are appropriate for the task, Python and its packages make quick work of developing a multiple regression model. Python also allows for great insights into the data with its visualization packages, making sure that linear relationships exist through scatterplots or checking residuals through histograms can be easily achieved using Seaborn or MatPlotLib.
Python packages such as Prince, StatsModels, and Pandas make the heavy lifting of data analysis and multiple regression creation quick and easy, using few lines of code that are easy to understand with a quick look through documentation.
Multiple regression allows for multiple independent varibles to assist in explaining how a dependent variable behaves under certain conditions. The multiple regression also allows for predictions based on how a group of independent variables change.
In regards to the specific question posed above, Tenure can change due to a variety of reason. A multiple regression will provide the information to show how Tenure changes and by how much based on the independent variables. This will allow descisions to be made based on
Data preperation goals:
Importing Pandas, creating a dataframe, and displaying summary statistics.
Assumptions of multiple linear regression. (2020, March 10). https://www.statisticssolutions.com/assumptions-of-multiple-linear-regression/.