Introduction: predicting the price of SHIB-USD
For this problem, we're going to focus on financial data. Before we begin, I would like to point out that LSTMs will not make you rich, even if they are excellent forecasters of time-series data. No model will make you rich.
We'll frame our problem as follows. We have historical price data for SHIB-USD, which includes the following predictors for each day (where we have daily time steps):
We aim to take some sequence of the above four values (say, for 100 previous days) and predict the target variable (SHIB-USD's price) for the next 50 days into the future.
Preprocessing and exploratory analysis
We begin by importing the data and quickly cleaning it. Fortunately, financial data is readily available online. We will use Yahoo historical prices for SHIB-USB, available back to August 8, 2020. This data is available here. Import the data using Pandas and have a look.
At the bare minimum, your exploratory data analysis should include plotting the target variable of interest. (Some people will argue that you should do much more than this, such as regressing the target variable on the predictors and looking for linear relationships between the variables.) Let's plot the SHIB-USD price over time to see what we're trying to predict.
Setting inputs and outputs
Recall that our predictors will consist of all the columns except our target closing price. Note that we want to use a 'sklearn' preprocessor below, which requires reshaping the array if it consists of a single feature, as our target does. Hence, for the target y, we have to call values, which removes the axes labels and will allow us to reshape the array.
We now have the task of standardizing our features. We'll use standardization for our training features XX by removing the mean and scaling to unit variance. The standard score of a sample x is calculated as:
z = \frac{(x - u)}{s}z= s (x−u)
where uu is the mean of the training samples, and ss is the standard deviation of the training samples. Standardisation of a dataset is a common requirement for many machine learning estimators: they might behave badly if the individual features do not more or less look like standard normally distributed data (e.g. Gaussian with 0 mean and unit variance).
For our target yy, we will scale and translate each feature individually to between 0 and 1. This transformation is often used as an alternative to zero mean, unit variance scaling.
We want to feed 100 samples to the current day and predict the following 50 time-step values. To do this, we need a special function to ensure that the corresponding indices of X and y represent this structure. Examine this function carefully, but it essentially boils down to getting 100 samples from X, then looking at the 50 following indices in y, and patching these together. Note that because of this, we'll throw out the first 50 values of y.
Let's check that the first sample in y_mm starts at the 100th sample in the original target y vector.
Above, we mentioned that we wanted to predict the data several months into the future. Thus, we'll use a training data size of 95%, with 5% left for the remaining data that we're going to predict. This gives us a training set size of 2763 days, or about seven and a half years. We will predict 145 days into the future, which is almost 5 months.
One more thing we want to check: the data logic of the test set. Sequential data is hard to get your head around, especially when it comes to generating a test-set for multi-step output models. Here, we want to take the 100 previous predictors up to the current time-step, and predict 50 time-steps into the future. In the test set, we have 150 batch feature samples, each consisting of 100 time-steps and four feature predictors. In the targets for the test set, we again have 150 batch samples, each consisting of an array of length 50 of scalar outputs.
Since we want a way to validate our results, we need to predict the Bitcoin price for 50 time steps in the test set for which we have the data (i.e. the test targets). Because of the way we wrote split_sequence() above, we simply need the last sample of 100 days in X_test, run the model on it, and compare these predictions with the last sample of 50 days of y_test. These correspond to a period of 100 days in X_test's last sample, proceeded immediately by the next 50 days in the last sample of y_test.
LSTM model
Now we need to construct the LSTM class, inheriting from nn.Module. In contrast to our previous univariate LSTM, we will build the model with the nn.LSTM rather than nn.LSTMCell. This is for two reasons: firstly, it's nice to be exposed to both so that we have the option. Secondly, we don't need the flexibility that nn.LSTMCell provides. We know that nn.LSTM is essentially just a recurrent application of nn.LSTMCell. Thus, we would only use nn.LSTMCell if we wanted to apply other transformations in between different LSTM layers, such as batch-normalization and dropout. However, we can implement dropout automatically using the dropout parameter in nn.LSTM. We've already standardized our data. Thus, there are not many reasons to use the more fiddly nn.LSTMCell.
As per usual, we'll present the entire model class first and then break it down the line by line.
Training
We use MSE as our loss function and the well-known Adam optimizer.
Let's train for 1000 epochs and see what happens. Recall in the previous article that a key part of LSTM debugging is visual cues. Here, our training is fast enough that we can just plot the result at the end, and if it's off, we can change our parameters and run it again.
Prediction
This is good. If we feed in the last 100 days of information, our model successfully predicts a steady decline in the price of SHIB-USD over the next 50 days. For one last plot, let's put this in perspective of the scale of the data.
Conclusion
Interestingly, there's essentially no information on the internet on how to construct multi-step output LSTM models for multivariate time-series data. Hopefully, this article gave you the intuition and technical understanding for building your forecasting models. Just remember to think through your input and output shapes very carefully and construct tensors that represent past data predicting future data. Pytorch's LSTM class will take care of the rest so long as you know the shape of your data.
In terms of the next steps, I would recommend running this model on the most recent Bitcoin data from today, extending back to 100 days previously. See what the model thinks will happen to the price of Bitcoin over the next 50 days. You could also play with the time being fed to the model and the time being forecast; try for longer periods and see if the model can pick up on longer-term dependencies. Finally, you should note that these types of LSTMs are not the only solution to these multivariate, multi-output forecasting problems. There are many other deep learning solutions, including encoder-decoder networks for variable-length sequences, that you should look into. LSTMs are a great place to start and can give incredible performance if you know how to utilize them.