import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('/work/Week 10 Team 6/tord_v3_edited.csv')
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6415 entries, 0 to 6414
Data columns (total 34 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 6415 non-null int64
1 name 6415 non-null object
2 token 6282 non-null object
3 country 6415 non-null object
4 is_ico 6415 non-null int64
5 is_ieo 6415 non-null int64
6 is_sto 6415 non-null int64
7 ico_start 5636 non-null object
8 ico_end 5497 non-null object
9 price_usd 5928 non-null object
10 raised_usd 2139 non-null float64
11 distributed_in_ico 4870 non-null float64
12 sold_tokens 192 non-null float64
13 token_for_sale 5122 non-null float64
14 whitelist 3882 non-null object
15 kyc 6415 non-null int64
16 bonus 6415 non-null int64
17 restricted_areas 2343 non-null object
18 min_investment 2079 non-null object
19 bounty 6415 non-null int64
20 mvp 1296 non-null object
21 pre_ico_start 2717 non-null object
22 pre_ico_end 2705 non-null object
23 pre_ico_price_usd 1733 non-null object
24 platform 6415 non-null int64
25 accepting 5545 non-null object
26 link_white_paper 5828 non-null object
27 linkedin_link 4355 non-null object
28 github_link 5649 non-null object
29 website 5649 non-null object
30 rating 5709 non-null float64
31 teamsize 4622 non-null float64
32 Coinmarketcap_identifier 1281 non-null float64
33 ERC20 5679 non-null float64
dtypes: float64(8), int64(8), object(18)
memory usage: 1.7+ MB
df.head(20)
df.isna().any()
df.describe()
#The coin market identifier is an ID for the ICO consider make justified reasoning on why you included it in the X variable.
# sanitize "raised_usd"
df['raised_usd'] = df['raised_usd'].fillna(0)
# sanitize independent variables
df['token'] = (df['token'].isnull()).astype(int)
df['ico_start'] = (df['ico_start'].isnull()).astype(int)
df['rating'] = df['rating'].fillna(0)
df['kyc'] = df['kyc'].fillna(0)
df['bonus'] = df['bonus'].fillna(0)
df['Coinmarketcap_identifier'] = df['Coinmarketcap_identifier'].fillna(0)
df['min_investment'] = df['min_investment'].fillna(0)
df['token_for_sale'] = df['token_for_sale'].fillna(0)
df['platform'] = df['platform'].fillna(0)
df['ERC20'] = df['ERC20'].fillna(0)
df['is_min_investment'] = np.where(df['min_investment'] == 0, 0, 1)
df['whitelist'] = np.where(df['whitelist'] == 'Yes', 1, 0)
df['is_us_restricted'] = np.where(df['restricted_areas'].str.contains('USA'), 0, 1)
df['link_white_paper'] = (df['link_white_paper'].isnull()).astype(int)
df['linkedin_link'] = (df['linkedin_link'].isnull()).astype(int)
df['github_link'] = (df['github_link'].isnull()).astype(int)
df['website'] = (df['website'].isnull()).astype(int)
# create outcome variable success
df['success'] = np.where(df['raised_usd'] >= 500000, 1, 0)
i = []
for n in df['token_for_sale']:
if n <= 40000000:
i.append(1)
elif n <= 158982462:
i.append(2)
else:
i.append(3)
This was a really smart thing to do!
df['token_sale_lvl'] = pd.DataFrame(i)
df.head(5)
df['pre_ico_start'] = pd.to_datetime(df['pre_ico_start'])
df['pre_ico_end'] = pd.to_datetime(df['pre_ico_end'])
df['duration_pre_ico'] = df['pre_ico_end'] - df['pre_ico_start']
df['duration_pre_ico'] = df['duration_pre_ico'].dt.days
df['duration_pre_ico'] = df['duration_pre_ico'].fillna(0)
df['success'].describe()
# Potential multicollinearity issue
fig, ax = plt.subplots(figsize=(20, 20))
corr = df.corr()
sns.heatmap(corr, annot=True)
plt.show()
A couple of your variables chose could have multicollinearity issues, like the rating and website or even Github link and ERC20. Try and see if you can drop a few of these variables or change the ones used.
!pip install statsmodels
Collecting statsmodels
Downloading statsmodels-0.13.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.8 MB)
|████████████████████████████████| 9.8 MB 18.3 MB/s
Requirement already satisfied: numpy>=1.17 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from statsmodels) (1.19.5)
Collecting patsy>=0.5.2
Downloading patsy-0.5.2-py2.py3-none-any.whl (233 kB)
|████████████████████████████████| 233 kB 47.9 MB/s
Requirement already satisfied: scipy>=1.3 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from statsmodels) (1.7.1)
Requirement already satisfied: pandas>=0.25 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from statsmodels) (1.2.5)
Requirement already satisfied: pytz>=2017.3 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from pandas>=0.25->statsmodels) (2021.3)
Requirement already satisfied: python-dateutil>=2.7.3 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from pandas>=0.25->statsmodels) (2.8.2)
Requirement already satisfied: six in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from patsy>=0.5.2->statsmodels) (1.16.0)
Installing collected packages: patsy, statsmodels
Successfully installed patsy-0.5.2 statsmodels-0.13.0
WARNING: You are using pip version 21.2.4; however, version 21.3 is available.
You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.
import statsmodels.api as sm
from statsmodels.formula.api import ols, logit
# dependent/target/outcome variable
y = df['success']
# independent/predictor/explanatory variable
X = df[['token','ico_start','rating','kyc','bonus','Coinmarketcap_identifier', 'teamsize', 'is_us_restricted', 'token_sale_lvl', 'platform', 'duration_pre_ico', 'link_white_paper','linkedin_link','website']]
# A. Logit regression
# turn independent variables into floating type (best practice)
# 'missing='drop'' drops rows with missing values from the regression
logit_model=sm.Logit(y,X.astype(float), missing='drop' )
# fit logit model into the data
result=logit_model.fit()
# summarize the logit model
print(result.summary2())
Optimization terminated successfully.
Current function value: 0.516346
Iterations 8
Results: Logit
=========================================================================
Model: Logit Pseudo R-squared: 0.156
Dependent Variable: success AIC: 4801.1024
Date: 2021-10-22 02:16 BIC: 4891.2426
No. Observations: 4622 Log-Likelihood: -2386.6
Df Model: 13 LL-Null: -2828.0
Df Residuals: 4608 LLR p-value: 2.4262e-180
Converged: 1.0000 Scale: 1.0000
No. Iterations: 8.0000
-------------------------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
-------------------------------------------------------------------------
token 1.1071 0.3240 3.4173 0.0006 0.4722 1.7421
ico_start -3.3642 0.3173 -10.6020 0.0000 -3.9861 -2.7423
rating -0.1217 0.0361 -3.3695 0.0008 -0.1924 -0.0509
kyc -0.2674 0.0758 -3.5270 0.0004 -0.4161 -0.1188
bonus -2.5822 0.3041 -8.4915 0.0000 -3.1782 -1.9862
Coinmarketcap_identifier 0.0003 0.0000 12.2832 0.0000 0.0002 0.0003
teamsize 0.0527 0.0055 9.6078 0.0000 0.0419 0.0634
is_us_restricted -0.4999 0.1161 -4.3067 0.0000 -0.7274 -0.2724
token_sale_lvl -0.1447 0.0399 -3.6257 0.0003 -0.2230 -0.0665
platform -0.1502 0.0741 -2.0268 0.0427 -0.2954 -0.0050
duration_pre_ico -0.0037 0.0010 -3.7667 0.0002 -0.0056 -0.0018
link_white_paper -0.8375 0.1618 -5.1773 0.0000 -1.1545 -0.5204
linkedin_link -0.5456 0.0798 -6.8377 0.0000 -0.7019 -0.3892
website -1.5937 0.1576 -10.1112 0.0000 -1.9026 -1.2848
=========================================================================
'''
options for "at"
1. 'overall' The average of the marginal effects at each observation
2. 'mean' The marginal effects at the mean of each regressor
3. 'median' The marginal effects at the median of each regressor
4. 'zero' The marginal effects at zero for each regressor
5. 'all' The marginal effects at each observation.
options for "method"
1. 'dydx' No transformation is made and amrginal effects are returned
2. 'eyex' estimate elasticities of variables in exog
3. 'dyex' estimate semi-elasticity
4. 'eydx' estimate semi-elasticity
'''
average_marginal_effect = result.get_margeff(at = "mean", method = "dydx")
print(average_marginal_effect.summary())
Logit Marginal Effects
=====================================
Dep. Variable: success
Method: dydx
At: mean
============================================================================================
dy/dx std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------
token 0.2005 0.059 3.425 0.001 0.086 0.315
ico_start -0.6091 0.049 -12.449 0.000 -0.705 -0.513
rating -0.0220 0.006 -3.401 0.001 -0.035 -0.009
kyc -0.0484 0.014 -3.519 0.000 -0.075 -0.021
bonus -0.4675 0.052 -9.059 0.000 -0.569 -0.366
Coinmarketcap_identifier 4.919e-05 4.14e-06 11.874 0.000 4.11e-05 5.73e-05
teamsize 0.0095 0.001 9.531 0.000 0.008 0.012
is_us_restricted -0.0905 0.021 -4.301 0.000 -0.132 -0.049
token_sale_lvl -0.0262 0.007 -3.625 0.000 -0.040 -0.012
platform -0.0272 0.013 -2.026 0.043 -0.053 -0.001
duration_pre_ico -0.0007 0.000 -3.765 0.000 -0.001 -0.000
link_white_paper -0.1516 0.029 -5.172 0.000 -0.209 -0.094
linkedin_link -0.0988 0.014 -6.814 0.000 -0.127 -0.070
website -0.2885 0.028 -10.307 0.000 -0.343 -0.234
============================================================================================
# B. Linear Probability Model
# logit regression
X = sm.add_constant(X)
ols_model=sm.OLS(y,X.astype(float), missing='drop')
result=ols_model.fit()
print(result.summary2())
Results: Ordinary least squares
=========================================================================
Model: OLS Adj. R-squared: 0.189
Dependent Variable: success AIC: 4960.5216
Date: 2021-10-22 02:16 BIC: 5057.1003
No. Observations: 4622 Log-Likelihood: -2465.3
Df Model: 14 F-statistic: 77.99
Df Residuals: 4607 Prob (F-statistic): 1.74e-200
R-squared: 0.192 Scale: 0.17070
-------------------------------------------------------------------------
Coef. Std.Err. t P>|t| [0.025 0.975]
-------------------------------------------------------------------------
const -0.0256 0.0337 -0.7593 0.4477 -0.0916 0.0405
token 0.2030 0.0494 4.1069 0.0000 0.1061 0.2999
ico_start -0.2124 0.0212 -10.0326 0.0000 -0.2539 -0.1709
rating 0.1004 0.0103 9.7722 0.0000 0.0803 0.1206
kyc -0.0619 0.0132 -4.6842 0.0000 -0.0878 -0.0360
bonus -0.2476 0.0238 -10.3961 0.0000 -0.2943 -0.2009
Coinmarketcap_identifier 0.0001 0.0000 14.3876 0.0000 0.0001 0.0001
teamsize 0.0086 0.0009 9.3751 0.0000 0.0068 0.0104
is_us_restricted -0.0799 0.0193 -4.1314 0.0000 -0.1177 -0.0420
token_sale_lvl 0.0022 0.0072 0.3041 0.7611 -0.0119 0.0163
platform -0.0049 0.0132 -0.3701 0.7113 -0.0308 0.0210
duration_pre_ico -0.0006 0.0002 -4.0468 0.0001 -0.0009 -0.0003
link_white_paper -0.0950 0.0249 -3.8205 0.0001 -0.1437 -0.0462
linkedin_link -0.0443 0.0147 -3.0151 0.0026 -0.0731 -0.0155
website 0.1272 0.0355 3.5868 0.0003 0.0577 0.1967
-------------------------------------------------------------------------
Omnibus: 713.337 Durbin-Watson: 1.797
Prob(Omnibus): 0.000 Jarque-Bera (JB): 390.330
Skew: 0.571 Prob(JB): 0.000
Kurtosis: 2.150 Condition No.: 14072
=========================================================================
* The condition number is large (1e+04). This might indicate
strong multicollinearity or other numerical problems.
#*Try eliminating this multicollinearity issue by changing a few of your variables if possible
# Use wrapper lazypredict
!pip install lazypredict
Collecting lazypredict
Downloading lazypredict-0.2.9-py2.py3-none-any.whl (12 kB)
Collecting scipy==1.5.4
Downloading scipy-1.5.4-cp37-cp37m-manylinux1_x86_64.whl (25.9 MB)
|████████████████████████████████| 25.9 MB 18.1 MB/s
Collecting click==7.1.2
Downloading click-7.1.2-py2.py3-none-any.whl (82 kB)
|████████████████████████████████| 82 kB 2.1 MB/s
Collecting xgboost==1.1.1
Downloading xgboost-1.1.1-py3-none-manylinux2010_x86_64.whl (127.6 MB)
|████████████████████████████████| 127.6 MB 55 kB/s
Collecting joblib==1.0.0
Downloading joblib-1.0.0-py3-none-any.whl (302 kB)
|████████████████████████████████| 302 kB 50.9 MB/s
Collecting pandas==1.0.5
Downloading pandas-1.0.5-cp37-cp37m-manylinux1_x86_64.whl (10.1 MB)
|████████████████████████████████| 10.1 MB 40.1 MB/s
Collecting six==1.15.0
Downloading six-1.15.0-py2.py3-none-any.whl (10 kB)
Collecting tqdm==4.56.0
Downloading tqdm-4.56.0-py2.py3-none-any.whl (72 kB)
|████████████████████████████████| 72 kB 2.3 MB/s
Collecting scikit-learn==0.23.1
Downloading scikit_learn-0.23.1-cp37-cp37m-manylinux1_x86_64.whl (6.8 MB)
|████████████████████████████████| 6.8 MB 33.7 MB/s
Collecting pytest==5.4.3
Downloading pytest-5.4.3-py3-none-any.whl (248 kB)
|████████████████████████████████| 248 kB 51.8 MB/s
Collecting lightgbm==2.3.1
Downloading lightgbm-2.3.1-py2.py3-none-manylinux1_x86_64.whl (1.2 MB)
|████████████████████████████████| 1.2 MB 32.3 MB/s
Collecting numpy==1.19.1
Downloading numpy-1.19.1-cp37-cp37m-manylinux2010_x86_64.whl (14.5 MB)
|████████████████████████████████| 14.5 MB 36.3 MB/s
Collecting PyYAML==5.3.1
Downloading PyYAML-5.3.1.tar.gz (269 kB)
|████████████████████████████████| 269 kB 48.9 MB/s
Requirement already satisfied: python-dateutil>=2.6.1 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from pandas==1.0.5->lazypredict) (2.8.2)
Requirement already satisfied: pytz>=2017.2 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from pandas==1.0.5->lazypredict) (2021.3)
Requirement already satisfied: wcwidth in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from pytest==5.4.3->lazypredict) (0.2.5)
Collecting pluggy<1.0,>=0.12
Downloading pluggy-0.13.1-py2.py3-none-any.whl (18 kB)
Requirement already satisfied: py>=1.5.0 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from pytest==5.4.3->lazypredict) (1.10.0)
Collecting more-itertools>=4.0.0
Downloading more_itertools-8.10.0-py3-none-any.whl (51 kB)
|████████████████████████████████| 51 kB 538 kB/s
Requirement already satisfied: packaging in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from pytest==5.4.3->lazypredict) (21.0)
Requirement already satisfied: attrs>=17.4.0 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from pytest==5.4.3->lazypredict) (21.2.0)
Requirement already satisfied: importlib-metadata>=0.12 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from pytest==5.4.3->lazypredict) (4.8.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from scikit-learn==0.23.1->lazypredict) (3.0.0)
Requirement already satisfied: typing-extensions>=3.6.4 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from importlib-metadata>=0.12->pytest==5.4.3->lazypredict) (3.10.0.2)
Requirement already satisfied: zipp>=0.5 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from importlib-metadata>=0.12->pytest==5.4.3->lazypredict) (3.6.0)
Requirement already satisfied: pyparsing>=2.0.2 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from packaging->pytest==5.4.3->lazypredict) (2.4.7)
Building wheels for collected packages: PyYAML
Building wheel for PyYAML (setup.py) ... done
Created wheel for PyYAML: filename=PyYAML-5.3.1-cp37-cp37m-linux_x86_64.whl size=44635 sha256=070b5f2b9c508236a64d81f1b309ce070c41e996483b902b0746d9073a34ff03
Stored in directory: /root/.cache/pip/wheels/5e/03/1e/e1e954795d6f35dfc7b637fe2277bff021303bd9570ecea653
Successfully built PyYAML
Installing collected packages: numpy, six, scipy, joblib, scikit-learn, pluggy, more-itertools, xgboost, tqdm, PyYAML, pytest, pandas, lightgbm, click, lazypredict
Attempting uninstall: numpy
Found existing installation: numpy 1.19.5
Not uninstalling numpy at /shared-libs/python3.7/py/lib/python3.7/site-packages, outside environment /root/venv
Can't uninstall 'numpy'. No files were found to uninstall.
Attempting uninstall: six
Found existing installation: six 1.16.0
Not uninstalling six at /shared-libs/python3.7/py-core/lib/python3.7/site-packages, outside environment /root/venv
Can't uninstall 'six'. No files were found to uninstall.
Attempting uninstall: scipy
Found existing installation: scipy 1.7.1
Not uninstalling scipy at /shared-libs/python3.7/py/lib/python3.7/site-packages, outside environment /root/venv
Can't uninstall 'scipy'. No files were found to uninstall.
Attempting uninstall: joblib
Found existing installation: joblib 1.1.0
Not uninstalling joblib at /shared-libs/python3.7/py/lib/python3.7/site-packages, outside environment /root/venv
Can't uninstall 'joblib'. No files were found to uninstall.
Attempting uninstall: scikit-learn
Found existing installation: scikit-learn 1.0
Not uninstalling scikit-learn at /shared-libs/python3.7/py/lib/python3.7/site-packages, outside environment /root/venv
Can't uninstall 'scikit-learn'. No files were found to uninstall.
Attempting uninstall: pluggy
Found existing installation: pluggy 1.0.0
Not uninstalling pluggy at /shared-libs/python3.7/py-core/lib/python3.7/site-packages, outside environment /root/venv
Can't uninstall 'pluggy'. No files were found to uninstall.
Attempting uninstall: tqdm
Found existing installation: tqdm 4.62.3
Not uninstalling tqdm at /shared-libs/python3.7/py/lib/python3.7/site-packages, outside environment /root/venv
Can't uninstall 'tqdm'. No files were found to uninstall.
Attempting uninstall: PyYAML
Found existing installation: PyYAML 5.4.1
Not uninstalling pyyaml at /shared-libs/python3.7/py/lib/python3.7/site-packages, outside environment /root/venv
Can't uninstall 'PyYAML'. No files were found to uninstall.
Attempting uninstall: pytest
Found existing installation: pytest 6.2.5
Not uninstalling pytest at /shared-libs/python3.7/py/lib/python3.7/site-packages, outside environment /root/venv
Can't uninstall 'pytest'. No files were found to uninstall.
Attempting uninstall: pandas
Found existing installation: pandas 1.2.5
Not uninstalling pandas at /shared-libs/python3.7/py/lib/python3.7/site-packages, outside environment /root/venv
Can't uninstall 'pandas'. No files were found to uninstall.
Attempting uninstall: click
Found existing installation: click 8.0.3
Not uninstalling click at /shared-libs/python3.7/py/lib/python3.7/site-packages, outside environment /root/venv
Can't uninstall 'click'. No files were found to uninstall.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.4.1 requires numpy~=1.19.2, but you have numpy 1.19.1 which is incompatible.
tensorflow 2.4.1 requires typing-extensions~=3.7.4, but you have typing-extensions 3.10.0.2 which is incompatible.
Successfully installed PyYAML-5.3.1 click-7.1.2 joblib-1.0.0 lazypredict-0.2.9 lightgbm-2.3.1 more-itertools-8.10.0 numpy-1.19.1 pandas-1.0.5 pluggy-0.13.1 pytest-5.4.3 scikit-learn-0.23.1 scipy-1.5.4 six-1.15.0 tqdm-4.56.0 xgboost-1.1.1
WARNING: You are using pip version 21.2.4; however, version 21.3 is available.
You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.
from lazypredict.Supervised import LazyClassifier, LazyRegressor
from sklearn.model_selection import train_test_split
/root/venv/lib/python3.7/site-packages/sklearn/utils/deprecation.py:143: FutureWarning: The sklearn.utils.testing module is deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.utils. Anything that cannot be imported from sklearn.utils is now part of the private API.
warnings.warn(message, FutureWarning)
# load data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2, random_state=42)
# fit all models
clf = LazyClassifier(predictions=True)
models, predictions = clf.fit(X_train, X_test, y_train, y_test)
100%|██████████| 29/29 [00:11<00:00, 2.43it/s]
models
predictions
# sanitize "token_for_sale"
df['token_for_sale'] = df['token_for_sale'].fillna(0)
# sanitize "sold_tokens"
df['sold_tokens'] = df['sold_tokens'].fillna(0)
# create Y variable success
df['take_up_rate'] = np.where(df['sold_tokens']/df['token_for_sale']>=0.6,1,0)
# dependent/target/outcome variable
y = df['take_up_rate']
# independent/predictor/explanatory variable
X = df[['token','ico_start','rating','kyc','bonus','Coinmarketcap_identifier', 'teamsize', 'is_us_restricted', 'token_sale_lvl', 'platform', 'duration_pre_ico', 'link_white_paper','linkedin_link','website']]
# A. Logit regression
# turn independent variables into floating type (best practice)
# 'missing='drop'' drops rows with missing values from the regression
logit_model=sm.Logit(y,X.astype(float), missing='drop' )
# fit logit model into the data
result=logit_model.fit()
# summarize the logit model
print(result.summary2())
Warning: Maximum number of iterations has been exceeded.
Current function value: 0.052118
Iterations: 35
Results: Logit
===============================================================================
Model: Logit Pseudo R-squared: 0.066
Dependent Variable: take_up_rate AIC: 509.7782
Date: 2021-10-22 02:17 BIC: 599.9183
No. Observations: 4622 Log-Likelihood: -240.89
Df Model: 13 LL-Null: -257.83
Df Residuals: 4608 LLR p-value: 0.0012564
Converged: 0.0000 Scale: 1.0000
No. Iterations: 35.0000
-------------------------------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
-------------------------------------------------------------------------------
token -12.3293 305.1026 -0.0404 0.9678 -610.3194 585.6608
ico_start -1.8362 0.7342 -2.5011 0.0124 -3.2752 -0.3973
rating -1.0045 0.1412 -7.1164 0.0000 -1.2812 -0.7279
kyc -1.0789 0.3709 -2.9086 0.0036 -1.8059 -0.3519
bonus 0.1457 0.6132 0.2375 0.8123 -1.0562 1.3475
Coinmarketcap_identifier 0.0002 0.0001 1.9322 0.0533 -0.0000 0.0003
teamsize 0.0322 0.0195 1.6515 0.0986 -0.0060 0.0704
is_us_restricted -0.2862 0.5302 -0.5398 0.5893 -1.3253 0.7529
token_sale_lvl -0.4607 0.1735 -2.6549 0.0079 -0.8008 -0.1206
platform -0.2858 0.2839 -1.0067 0.3141 -0.8423 0.2706
duration_pre_ico -0.0106 0.0057 -1.8487 0.0645 -0.0218 0.0006
link_white_paper -0.2781 0.5299 -0.5248 0.5997 -1.3167 0.7605
linkedin_link -0.7980 0.3577 -2.2311 0.0257 -1.4991 -0.0970
website -20.5236 2291.0124 -0.0090 0.9929 -4510.8255 4469.7782
===============================================================================
average_marginal_effect = result.get_margeff(at = "mean", method = "dydx")
print(average_marginal_effect.summary())
Logit Marginal Effects
=====================================
Dep. Variable: take_up_rate
Method: dydx
At: mean
============================================================================================
dy/dx std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------
token -0.0095 2.373 -0.004 0.997 -4.660 4.641
ico_start -0.0014 0.352 -0.004 0.997 -0.692 0.689
rating -0.0008 0.193 -0.004 0.997 -0.379 0.377
kyc -0.0008 0.207 -0.004 0.997 -0.407 0.405
bonus 0.0001 0.028 0.004 0.997 -0.055 0.055
Coinmarketcap_identifier 1.268e-07 3.17e-05 0.004 0.997 -6.2e-05 6.23e-05
teamsize 2.472e-05 0.006 0.004 0.997 -0.012 0.012
is_us_restricted -0.0002 0.055 -0.004 0.997 -0.108 0.107
token_sale_lvl -0.0004 0.088 -0.004 0.997 -0.174 0.173
platform -0.0002 0.055 -0.004 0.997 -0.108 0.107
duration_pre_ico -8.132e-06 0.002 -0.004 0.997 -0.004 0.004
link_white_paper -0.0002 0.053 -0.004 0.997 -0.105 0.104
linkedin_link -0.0006 0.153 -0.004 0.997 -0.301 0.300
website -0.0158 2.181 -0.007 0.994 -4.290 4.258
============================================================================================
X = sm.add_constant(X)
ols_model=sm.OLS(y,X.astype(float), missing='drop')
result=ols_model.fit()
print(result.summary2())
Results: Ordinary least squares
========================================================================
Model: OLS Adj. R-squared: 0.006
Dependent Variable: take_up_rate AIC: -8248.9253
Date: 2021-10-22 02:17 BIC: -8152.3465
No. Observations: 4622 Log-Likelihood: 4139.5
Df Model: 14 F-statistic: 3.019
Df Residuals: 4607 Prob (F-statistic): 0.000118
R-squared: 0.009 Scale: 0.0097956
------------------------------------------------------------------------
Coef. Std.Err. t P>|t| [0.025 0.975]
------------------------------------------------------------------------
const 0.0257 0.0081 3.1800 0.0015 0.0098 0.0415
token -0.0185 0.0118 -1.5588 0.1191 -0.0417 0.0048
ico_start -0.0084 0.0051 -1.6484 0.0993 -0.0183 0.0016
rating -0.0008 0.0025 -0.3322 0.7398 -0.0056 0.0040
kyc -0.0115 0.0032 -3.6222 0.0003 -0.0177 -0.0053
bonus 0.0022 0.0057 0.3907 0.6960 -0.0090 0.0134
Coinmarketcap_identifier 0.0000 0.0000 2.1746 0.0297 0.0000 0.0000
teamsize 0.0003 0.0002 1.4314 0.1524 -0.0001 0.0007
is_us_restricted -0.0021 0.0046 -0.4490 0.6534 -0.0112 0.0070
token_sale_lvl -0.0033 0.0017 -1.9053 0.0568 -0.0066 0.0001
platform -0.0015 0.0032 -0.4701 0.6383 -0.0077 0.0047
duration_pre_ico -0.0001 0.0000 -1.5629 0.1181 -0.0001 0.0000
link_white_paper 0.0030 0.0060 0.5072 0.6120 -0.0087 0.0147
linkedin_link -0.0039 0.0035 -1.1103 0.2669 -0.0108 0.0030
website -0.0156 0.0085 -1.8379 0.0661 -0.0323 0.0010
------------------------------------------------------------------------
Omnibus: 7010.995 Durbin-Watson: 1.935
Prob(Omnibus): 0.000 Jarque-Bera (JB): 1765349.299
Skew: 9.738 Prob(JB): 0.000
Kurtosis: 96.741 Condition No.: 14072
========================================================================
* The condition number is large (1e+04). This might indicate
strong multicollinearity or other numerical problems.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2, random_state=42)
# fit all models
clf = LazyClassifier(predictions=True)
models, predictions = clf.fit(X_train, X_test, y_train, y_test)
100%|██████████| 29/29 [00:07<00:00, 3.65it/s]
models
predictions
Overall, this is a really really good project (especially that score of 98% :) ). However, not all the data speaks for itself so make sure to include some more discussion and resolve the multicollinearity issues.