September Effect
!pip install yfinance --upgrade --no-cache-dir
Requirement already satisfied: yfinance in /root/venv/lib/python3.7/site-packages (0.1.63)
Requirement already satisfied: pandas>=0.24 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from yfinance) (1.2.5)
Requirement already satisfied: requests>=2.20 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from yfinance) (2.26.0)
Requirement already satisfied: lxml>=4.5.1 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from yfinance) (4.6.3)
Requirement already satisfied: multitasking>=0.0.7 in /root/venv/lib/python3.7/site-packages (from yfinance) (0.0.9)
Requirement already satisfied: numpy>=1.15 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from yfinance) (1.19.5)
Requirement already satisfied: python-dateutil>=2.7.3 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from pandas>=0.24->yfinance) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from pandas>=0.24->yfinance) (2021.1)
Requirement already satisfied: six>=1.5 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas>=0.24->yfinance) (1.16.0)
Requirement already satisfied: certifi>=2017.4.17 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from requests>=2.20->yfinance) (2021.5.30)
Requirement already satisfied: idna<4,>=2.5 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from requests>=2.20->yfinance) (3.2)
Requirement already satisfied: charset-normalizer~=2.0.0 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from requests>=2.20->yfinance) (2.0.6)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from requests>=2.20->yfinance) (1.26.7)
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import datetime as dt
import yfinance as yf
# Question: find the symbol (i.e., google the instrument + 'yahoo finance') to any data series you are interested at
# e.g., market/sector index ETF for your chosen country and various asset classes (e.g., Comex Gold's symbol is 'GC=F')
# e.g., SPY (https://finance.yahoo.com/quote/SPY/)
#SPY is S&P 500 index
#IWM is the Russel 2000 for small cap company
#DJIA is the Dow Jones industrial index
symbols_list_SPY = ['SPY']
symbols_list_DJ = ['DJIA']
symbols_list_IWM = ['IWM']
start = dt.datetime(2015,9,1)
end = dt.datetime(2020,10,31)
data_SPY = yf.download(symbols_list_SPY, start=start, end=end)
data_DJ = yf.download(symbols_list_DJ, start=start, end=end)
data_IWM = yf.download(symbols_list_IWM, start=start, end=end)
data_SPY.info()
data_DJ.info()
data_IWM.info()
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1302 entries, 2015-09-01 to 2020-10-30
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Open 1302 non-null float64
1 High 1302 non-null float64
2 Low 1302 non-null float64
3 Close 1302 non-null float64
4 Adj Close 1302 non-null float64
5 Volume 1302 non-null int64
dtypes: float64(5), int64(1)
memory usage: 71.2 KB
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1302 entries, 2015-09-01 to 2020-10-30
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Open 1302 non-null float64
1 High 1302 non-null float64
2 Low 1302 non-null float64
3 Close 1302 non-null float64
4 Adj Close 1302 non-null float64
5 Volume 1302 non-null int64
dtypes: float64(5), int64(1)
memory usage: 71.2 KB
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1302 entries, 2015-09-01 to 2020-10-30
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Open 1302 non-null float64
1 High 1302 non-null float64
2 Low 1302 non-null float64
3 Close 1302 non-null float64
4 Adj Close 1302 non-null float64
5 Volume 1302 non-null int64
dtypes: float64(5), int64(1)
memory usage: 71.2 KB
data_SPY.head()
data_DJ.head()
data_IWM.head()
df_SPY = data_SPY.reset_index()
df_DJ = data_DJ.reset_index()
df_IWM = data_IWM.reset_index()
df_SPY.info()
df_DJ.info()
df_IWM.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1302 entries, 0 to 1301
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 1302 non-null datetime64[ns]
1 Open 1302 non-null float64
2 High 1302 non-null float64
3 Low 1302 non-null float64
4 Close 1302 non-null float64
5 Adj Close 1302 non-null float64
6 Volume 1302 non-null int64
dtypes: datetime64[ns](1), float64(5), int64(1)
memory usage: 71.3 KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1302 entries, 0 to 1301
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 1302 non-null datetime64[ns]
1 Open 1302 non-null float64
2 High 1302 non-null float64
3 Low 1302 non-null float64
4 Close 1302 non-null float64
5 Adj Close 1302 non-null float64
6 Volume 1302 non-null int64
dtypes: datetime64[ns](1), float64(5), int64(1)
memory usage: 71.3 KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1302 entries, 0 to 1301
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 1302 non-null datetime64[ns]
1 Open 1302 non-null float64
2 High 1302 non-null float64
3 Low 1302 non-null float64
4 Close 1302 non-null float64
5 Adj Close 1302 non-null float64
6 Volume 1302 non-null int64
dtypes: datetime64[ns](1), float64(5), int64(1)
memory usage: 71.3 KB
# filter column adjusted close
df_SPY = df_SPY[['Date','Adj Close', 'Volume']]
df_DJ = df_DJ[['Date','Adj Close', 'Volume']]
df_IWM = df_IWM[['Date','Adj Close', 'Volume']]
df_SPY.info()
df_DJ.info()
df_IWM.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1302 entries, 0 to 1301
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 1302 non-null datetime64[ns]
1 Adj Close 1302 non-null float64
2 Volume 1302 non-null int64
dtypes: datetime64[ns](1), float64(1), int64(1)
memory usage: 30.6 KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1302 entries, 0 to 1301
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 1302 non-null datetime64[ns]
1 Adj Close 1302 non-null float64
2 Volume 1302 non-null int64
dtypes: datetime64[ns](1), float64(1), int64(1)
memory usage: 30.6 KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1302 entries, 0 to 1301
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 1302 non-null datetime64[ns]
1 Adj Close 1302 non-null float64
2 Volume 1302 non-null int64
dtypes: datetime64[ns](1), float64(1), int64(1)
memory usage: 30.6 KB
# create variables
df_SPY['month_SPY'] = df_SPY['Date'].dt.month
df_DJ['month_DJ'] = df_DJ['Date'].dt.month
df_IWM['month_IWM'] = df_IWM['Date'].dt.month
df_SPY['return'] = df_SPY['Adj Close'].pct_change()
df_DJ['return'] = df_DJ['Adj Close'].pct_change()
df_IWM['return'] = df_IWM['Adj Close'].pct_change()
df_SPY['annualized_volatility'] = (df_SPY['return'].rolling(252).std())*(252)**(1/2)
df_DJ['annualized_volatility'] = (df_DJ['return'].rolling(252).std())*(252)**(1/2)
df_IWM['annualized_volatility'] = (df_IWM['return'].rolling(252).std())*(252)**(1/2)
#use 252 trading days in a year, and we are looking at Sep. and Oct.'s Difference, so we divide 252 by 12= 21
df_SPY.tail()
df_DJ.tail()
df_IWM.tail()
# create dataframes containing Sep. and Oct. returns respectively to the three indexes
September_returns_SPY = df_SPY.query('''month_SPY == 9''')[1:]
September_returns_DJ = df_DJ.query('''month_DJ == 9''')[1:]
September_returns_IWM = df_IWM.query('''month_IWM == 9''')[1:]
October_returns_SPY = df_SPY.query('''month_SPY == 10''')[1:]
October_returns_DJ = df_DJ.query('''month_DJ == 10''')[1:]
October_returns_IWM = df_IWM.query('''month_IWM == 10''')[1:]
September_returns_SPY.dropna()
September_returns_DJ.dropna()
September_returns_IWM.dropna()
October_returns_SPY.dropna()
October_returns_DJ.dropna()
October_returns_IWM.dropna()
#first 252 will be nan for volatiliy, need to be drop
September_returns_SPY['return'].hist(bins=80, color='r', alpha=0.5)
October_returns_SPY['return'].hist(bins=80, color='g', alpha=0.5)
plt.title('SPY Return')
September_returns_DJ['return'].hist(bins=80, color='r', alpha=0.5)
October_returns_DJ['return'].hist(bins=80, color='g', alpha=0.5)
plt.title('DJ Return')
September_returns_IWM['return'].hist(bins=80, color='r', alpha=0.5)
October_returns_IWM['return'].hist(bins=80, color='g', alpha=0.5)
plt.title('IWM Return')
September_returns_SPY['return'].describe()
September_returns_DJ['return'].describe()
September_returns_DJ['return'].describe()
October_returns_SPY['return'].describe()
October_returns_DJ['return'].describe()
October_returns_IWM['return'].describe()
import scipy.stats as stats
print("Difference in mean return of SPY: ")
print((September_returns_SPY['return'].mean() - October_returns_SPY['return'].mean())*100)
stat, p = stats.ttest_ind(September_returns_SPY['return'], October_returns_SPY['return'], equal_var=False)
# interpret p-value
alpha = 0.05
print("p value is " + str(p))
if p <= alpha:
print('The difference in mean return of SPY is significantly different (reject H0)')
else:
print('The difference in mean return of SPY is not significantly different (fail to reject H0)')
print("Difference in mean return of DJ: ")
print((September_returns_DJ['return'].mean() - October_returns_DJ['return'].mean())*100)
stat, p = stats.ttest_ind(September_returns_DJ['return'], October_returns_DJ['return'], equal_var=False)
# interpret p-value
alpha = 0.05
print("p value is " + str(p))
if p <= alpha:
print('The difference in mean return of DJ is significantly different (reject H0)')
else:
print('The difference in mean return of DJ is not significantly different (fail to reject H0)')
print("Difference in mean return of IWM: ")
print((September_returns_IWM['return'].mean() - October_returns_IWM['return'].mean())*100)
stat, p = stats.ttest_ind(September_returns_IWM['return'], October_returns_IWM['return'], equal_var=False)
# interpret p-value
alpha = 0.05
print("p value is " + str(p))
if p <= alpha:
print('The difference in mean return of IWM is significantly different (reject H0)')
else:
print('The difference in mean return of IWM is not significantly different (fail to reject H0)')
Difference in mean return of SPY:
0.0020230214812785347
p value is 0.9863926801130247
The difference in mean return of SPY is not significantly different (fail to reject H0)
Difference in mean return of DJ:
0.02081425137572417
p value is 0.8530174287037202
The difference in mean return of DJ is not significantly different (fail to reject H0)
Difference in mean return of IWM:
0.049851528071846556
p value is 0.7260936000661316
The difference in mean return of IWM is not significantly different (fail to reject H0)
All 3 indexes
September_returns_SPY['annualized_volatility'].hist(bins=80, color='r', alpha=0.5)
October_returns_SPY['annualized_volatility'].hist(bins=80, color='g', alpha=0.5)
plt.title('SPY Volatility')
September_returns_DJ['annualized_volatility'].hist(bins=80, color='r', alpha=0.5)
October_returns_DJ['annualized_volatility'].hist(bins=80, color='g', alpha=0.5)
plt.title('DJ Volatility')
September_returns_IWM['annualized_volatility'].hist(bins=80, color='r', alpha=0.5)
October_returns_IWM['annualized_volatility'].hist(bins=80, color='g', alpha=0.5)
plt.title('IWM Volatility')
September_returns_SPY['annualized_volatility'].describe()
September_returns_DJ['annualized_volatility'].describe()
September_returns_IWM['annualized_volatility'].describe()
October_returns_SPY['annualized_volatility'].describe()
October_returns_DJ['annualized_volatility'].describe()
October_returns_IWM['annualized_volatility'].describe()
import scipy.stats as stats
print("Difference in mean return of SPY volatility: ")
print((September_returns_SPY['annualized_volatility'].notna().mean() - October_returns_SPY['annualized_volatility'].notna().mean())*100)
stat, p = stats.ttest_ind(September_returns_SPY['annualized_volatility'].notna(), October_returns_SPY['annualized_volatility'].notna(), equal_var=False)
# interpret p-value
alpha = 0.05
print("p value is " + str(p))
if p <= alpha:
print('The difference in mean return SPY volatility is significantly different (reject H0)')
else:
print('The difference in mean return SPY volatility is not significantly different (fail to reject H0)')
print("Difference in mean return of DJ volatility: ")
print((September_returns_DJ['annualized_volatility'].notna().mean() - October_returns_DJ['annualized_volatility'].notna().mean())*100)
stat, p = stats.ttest_ind(September_returns_DJ['annualized_volatility'].notna(), October_returns_DJ['annualized_volatility'].notna(), equal_var=False)
# interpret p-value
alpha = 0.05
print("p value is " + str(p))
if p <= alpha:
print('The difference in mean return DJ volatility is significantly different (reject H0)')
else:
print('The difference in mean return DJ volatility is not significantly different (fail to reject H0)')
print("Difference in mean return of IWM volatility: ")
print((September_returns_IWM['annualized_volatility'].notna().mean() - October_returns_IWM['annualized_volatility'].notna().mean())*100)
stat, p = stats.ttest_ind(September_returns_IWM['annualized_volatility'].notna(), October_returns_IWM['annualized_volatility'].notna(), equal_var=False)
# interpret p-value
alpha = 0.05
print("p value is " + str(p))
if p <= alpha:
print('The difference in mean return IWM volatility is significantly different (reject H0)')
else:
print('The difference in mean return IWM volatility is not significantly different (fail to reject H0)')
Difference in mean return of SPY volatility:
-0.6198347107438051
p value is 0.8942774816945345
The difference in mean return SPY volatility is not significantly different (fail to reject H0)
Difference in mean return of DJ volatility:
-0.6198347107438051
p value is 0.8942774816945345
The difference in mean return DJ volatility is not significantly different (fail to reject H0)
Difference in mean return of IWM volatility:
-0.6198347107438051
p value is 0.8942774816945345
The difference in mean return IWM volatility is not significantly different (fail to reject H0)
stat, p = stats.ttest_ind(df_SPY['annualized_volatility'], df_SPY['annualized_volatility'].notna(), equal_var=False)
print("p value is " + str(p))
p value is nan
Conclusion from our test: As we can see in our P-value result, we failed to reject the Ho for three indexes. It means that in the year of 2015 to 2020, the September and October return volatilities have no significant difference. That proves the September effect might be superstitious.
September_returns_SPY['Volume'].hist(bins=100, color='r', alpha=0.5)
October_returns_SPY['Volume'].hist(bins=100, color='g', alpha=0.5)
plt.title('SPY Volume')
September_returns_DJ['Volume'].hist(bins=100, color='r', alpha=0.5)
October_returns_DJ['Volume'].hist(bins=100, color='g', alpha=0.5)
plt.title('DJ Volume')
September_returns_IWM['Volume'].hist(bins=100, color='r', alpha=0.5)
October_returns_IWM['Volume'].hist(bins=100, color='g', alpha=0.5)
plt.title('IWM Volume')
September_returns_SPY['Volume'].describe()
September_returns_DJ['Volume'].describe()
September_returns_IWM['Volume'].describe()
October_returns_SPY['Volume'].describe()
October_returns_DJ['Volume'].describe()
October_returns_IWM['Volume'].describe()
import scipy.stats as stats
print("Difference in mean trading volume of SPY: ")
print(September_returns_SPY['Volume'].mean() - October_returns_SPY['Volume'].mean())
stat, p = stats.ttest_ind(September_returns_SPY['Volume'], October_returns_SPY['Volume'], equal_var=False)
# interpret p-value
alpha = 0.05
print("p value is " + str(p))
if p <= alpha:
print('The difference in mean trading volume of SPY is significantly different (reject H0)')
else:
print('The difference in mean trading volume of SPY is not significantly different (fail to reject H0)')
#DJ
print("Difference in mean trading volume of SPY: ")
print(September_returns_DJ['Volume'].mean() - October_returns_DJ['Volume'].mean())
stat, p = stats.ttest_ind(September_returns_DJ['Volume'], October_returns_DJ['Volume'], equal_var=False)
# interpret p-value
alpha = 0.05
print("p value is " + str(p))
if p <= alpha:
print('The difference in mean trading volume of DJ is significantly different (reject H0)')
else:
print('The difference in mean trading volume of DJ is not significantly different (fail to reject H0)')
#IWM
print("Difference in mean trading volume of SPY: ")
print(September_returns_IWM['Volume'].mean() - October_returns_IWM['Volume'].mean())
stat, p = stats.ttest_ind(September_returns_IWM['Volume'], October_returns_IWM['Volume'], equal_var=False)
# interpret p-value
alpha = 0.05
print("p value is " + str(p))
if p <= alpha:
print('The difference in mean trading volume of IWM is significantly different (reject H0)')
else:
print('The difference in mean trading volume of IWM is not significantly different (fail to reject H0)')
Difference in mean trading volume of SPY:
3430217.3553719074
p value is 0.5262269802867261
The difference in mean trading volume of SPY is not significantly different (fail to reject H0)
Difference in mean trading volume of SPY:
29873181.81818199
p value is 0.7102945984110909
The difference in mean trading volume of DJ is not significantly different (fail to reject H0)
Difference in mean trading volume of SPY:
-144323.3471074365
p value is 0.9119290804204188
The difference in mean trading volume of IWM is not significantly different (fail to reject H0)
Conclusion from our test: As we can see in our P-value result, we failed to reject the Ho for three indexes. It means that in the year of 2015 to 2020, the September and the October return volumes have no significant difference. That proves the September effect might be superstitious.