Stock Price Analysis
The objective of this Project is to merge data on a selected group of equities, be it indices or individual stocks, and conduct a statistical analysis to look at the correlation of these assets prices and, more importnatly, the returns, as well as the volatility of the securities returns over the past five years. It will also conduct a signifficance test on the correlations to determine if they are signifficant. Finally, it will look at the Risk Adjusted return, teh Sharpe Ratio, to evaluate the securities. This analysis can then be an initial indication of how we would consider constructing a portfolio of assets and diversifying risk. We will select a mix of growth and value assets in order to see some differneces in our results, and a market benchmark, such as teh SP500 to compare individual stocks to the market.
The findings in this report showed that the selected stocks were strongly correlated to each other and the market. However the Gold asset was not statistically signifficant to the 95% confidence level. The report also found that the growth assets were more risky than the market and than defensive assets, in terms of volatility, measured as the standard deviation of the returns. However, two of them, AAPL and MSFT, were the best Risk Adjusted investments, as they had the best Sharpe ratio's. We will see these findings in detail in the following analysis.
The first thing we will require is to download the necessary packages and statistical methods to download and clean the data. Note that besides working with Pandas, Numpy and visualization packages like matplotlib and seaborn, we will use the yfinance module to directly download the price data from Yahoo Finance.
Once we have all the software needed, let's start by downloading the price data from Yahoo Finance using the yfinance package. To do this we will first initialize a list of 6 stocks that I am interested in exploring. One of them will be the market price or SPY, so that we can compare the returns and prices of the other assets to the overall market. I chose three common blue Chip companies like Microsoft, Apple and Amazon, and two other assets, Walmart and Gold, which will hopefully show les correlation to the market.
We will initialize an empty Dataframe to store the price data, and then use yf.download to download the data from Yahoo FInance into this data frame. We will do this using a for loop so we can run it for each of the stocks, and reset the Date index as a column, so we can use it to merge the other downloaded stocks as they come in. Inside the same loop, we will merge the newly aquired stock prices to the dataframe by merging it with the merge() function. This will be done on the basis of the 'Date' and 'Close' columns that all the securities share, since they wil have the same 5 year data as specified in the yf imported function.
[*********************100%***********************] 1 of 1 completed [*********************100%***********************] 1 of 1 completed [*********************100%***********************] 1 of 1 completed [*********************100%***********************] 1 of 1 completed [*********************100%***********************] 1 of 1 completed [*********************100%***********************] 1 of 1 completed
As we can see from the resulting Dataframe above, we now have the Closing price for each asset over the past 5 years, starting in May 1 2018 and ending in May 1st 2023. We have the Date column in datetime object as we like, and the prices are all stored as floats as we would expect. We can also see that we have no missing values or NANs, which is simpler for the purpose of the analysis. Given we have all the relevant data and no need for cleaning, we can now proceed to analyze the data.
First, lets start by visually plotting the data so we can get an ideal sense of how the prices of these securities have done over the five year period. To do this we can use Matplotlib to generate a lineplot of the prices over time.
Since the prices are not on the same scale. Gold has a much higher price per ounce than the 5 stocks per share. Hence we will normalize the prices below, using a simple indexing calculation. This means dividing each stock by its original price and multiplying by 100 to get all the prices indexed to 100. To do this we will reset the 'Date' column as the index so we can index all the columns except the date in one swoop.
As shown by the resulting plot of the data, we now have a more helpful visual of the performance of the assets over time. We can see that Apple has been the biggest performer over the five years, followed by Microsoft, Walmart, SPY, Gold and finally Amazon. This gives us an initial sense of the correlation between the assets too. We can see in general that the asset prices move together fairly strongly, albeit the best performers do so in a much more pronounced fashion. This is due to the market sensitivity of these assets, something we could also explore by calculating the Beta of the assets. However, for now lets first look at the correlatio of the stocks. We can do this visually in two ways.
First, we can look at a pairplot using Seaborn's Pariplot graphing function. This will grapically show the relationship between each stock. We can preemptively think that most of them will show a fairly strong linear relationship, but let's confirm this.
The pair plot confirms our suspicion. That all of the assets have a strong linear relationship. This means that they follow the same performance as the respective pair. For example, if Apple and Microsoft have a strong linear relationship, as Apple increases in price, so will Microsoft. This is important to know in security analysis and portfolio construction because diversification is key, and you usually want to diversify your assets to minimize the volatility of your portfolio.
A more precise approach is to numerically calculate the relationship of these assets. We can do this by calculating a correlation matrix. This can be done usign the corr() method to compute the correlation matrix of the dataframe (after dropping the date column since we do not want compute the correlation of the Date object). This will give us a numerical value between -1 and 1 that shows the strength of the correlation between two variables, in this case the stocks. We can then use this dataframe and pass it to seaborn's heatmap graph, which will color code the strength of these relationships.
The resulting heatmap is a useful visual to interpret the correlation between the stocks. As we can see, most these assets are strongly correlated to each other, given the positive correlation coefficients. For example, Microsoft adn Apple have the trongest correlation of .97, hence if you hold both these assets you can expect almost the same return from both assets. This also means you are exposed to the same type of risk if the asset moves to the downside. MSFT and SPY, the market, also had the strngest correlation, which is unsuprising since MSFT is a string component of the S&P 500, and a similar case happened with APPLE and SPY.
The least correlated asets are highlighted in blue. We can see that Amazon's negative price performance broke some of its correlation to the market adn the rest of the assets, uet generally still moves in the same direction, with a correlation of .73 on average. The asset we would have expected to have less correlation to the market, Gold (GC = F), only had mild correlation with the market (SPY) and AMZN, but still ad a signifficant correlation to the other assets.
However lets keep in mind that these correlations measure the strenght between asset prices, and such calcualtion may ignore the magnitude of these changes. A more relevant metric to look at is the Percentage Returns of the varibles over time. After all that is what investors care about.
Hence, the following step in the analsis is to look at the correlation and volatility of the monthly returns of these assets. So the first thing we need to do is compute the monthly returns of each asset. Luckily, there are methods for resampling time series data at the end of each month(.resample('M').last), and we can use the pct_change() method to compute the percentage change from the previous month.
As shown by the resulting dataframe produced, we now have the monthly percentage returns of each of the 6 assets indexed by their date. Now we can plot these results to show a true indication of monthly performance over hte five year period. Now the next step is to visualize these returns to help us show a time series representation of the performance of the 6 assets over time.
The plot reveals several intersting things. Generally, we can see that the returns are still positively correlated. We can see patterns in the data where teh assets move in the same direction. However it is clear from the plot that the magnitude, or volatility, of these changes, varies across the assets. Apple and Amazon seem to have the highest volatility of the assets. Furthermore, we can see general market patterns, such as the sharp market corrections in 2019 during the pandmeic, a big market rally in 2019, another crash in 2022 adn subsequent rallies. There has been quite a lot of volatility!
Let's reinspect the correlations of teh returns again to see if the assets still maintain the same relationship. THe correlation matrix is computed and plotted as a heatmap.
We can see that we now have a different correlation matrix than before. The returns of the assets are no longer as strongly correlated as the prices. This is important because as investors we care about the returns of the securities over time. Now we can see that in fact, the returns in the price of gold have a much weaker correlation to the six stocks, so that when the prices of teh other assets increase, Gold will do so less heavily. But that also applies to the downside risk. We can also see that Walmart, a defensive stock, is less heavily correlated to the prices of growth stocks like Amazon or Microsoft, and in general the market.
We can try and test these correlation for statistical signifficance to confirm we can trust to a high degree of confidence the relationship between these variables. THis can be done by computing the p-values of the peason correlation coefficients.
For each pair of stocks, the pearsonr() function is used to calculate the Pearson correlation coefficient and the p-value. The p-values are stored in a new DataFrame.If the p-value is less than 0.05, you can reject the null hypothesis and conclude that the correlation is statistically significant at the 5% level. If the p-value is greater than 0.05, you cannot reject the null hypothesis, and the correlation is not statistically significant at the 5% level.
From the results we can see that the Gold coefficients of correlation do not seem to be signifficant at teh 95% confidence interval but most of our other statistics are so. This is important to keep in mind when evaluating the results.
A very important measure in finance is risk. This is usually measured in terms of the standard deviation. To get a sense of the volatility of these returns, lets get some summary statistics and oberve the standard deviation of the returns of each stock. This gives us an indication of how much the returns of these assets vary from the mean returns over the period.
AAPL GC=F AMZN SPY MSFT WMT count 59.000000 59.000000 59.000000 59.000000 59.000000 59.000000 mean 2.636325 0.800315 0.919353 0.877254 2.140408 1.165386 std 9.357074 3.959971 9.977807 5.462855 6.429473 5.253441 min -18.404459 -6.922468 -23.752509 -12.998720 -10.926686 -15.922609 25% -5.368833 -2.150232 -5.555641 -2.503813 -2.345018 -2.344575 50% 4.092967 0.244545 1.358730 1.854927 2.254335 1.515256 75% 9.790480 3.234280 6.411452 4.141946 6.266379 4.364380 max 21.437974 9.470164 27.059597 12.698355 17.629107 10.180522
With this simple code, we can see that AMZN has the highest standard deviation of 9.97% on a monthly basis, followed closely by AAPL at 9.36%. Meanwhile Gold is the most stable, less volatile asset with a standard deviation of 3.95%. Thes are important considerations since the standard deviation measures the risk of a security. Generally, it is important to compare these measures to the market, hence we can say that AMZN and AAPL are more volatile, or risky than the market (5.46%) and Gold and Walmart, more defensive assets, are less volatile.
Volatility is important to consider in terms of return. For example we can see that the more risky assets, such as AAPL adn MSFT also have a higher mean montlhy return, while gold and Walmart have lower average returns. AN interesting metric to compute in this regard is the Sharpe Ratio. In finance, the Sharpe Ratio is a risk-adjusted measure of return. It is computed as the average return earned in excess of the risk-free rate per unit of volatility or total risk. Hence to compute it, I will take the average return, subtract the risk free rate, which I will consider to be zero in this case, and divide it by the volatility.
AAPL 0.063157 GC=F 0.039342 AMZN 0.021418 SPY 0.033330 MSFT 0.057487 WMT 0.037445 dtype: float64
As shown by the resulting ratios, we can see that the best risk adjusted return was obtained by AAPLE, followed by Microsoft. So even though these securities are more volatile, they have the highest reward per unit of risk. Meanwhile Amazon had teh worst return per unit of risk of these securites.
A good visual to represent the volatility of returns is a boxplot. This can show us the mean spread of teh data, and any outliers there may be. We can do this using seaborns boxplot function. In order to do this, seaborn usually profers long formatted data, so I will melt the return dataframe so that we have two columns, an Asset column and a Return column.