# Start writing import numpy as np import pandas as pd import matplotlib.pyplot as plt import matplotlib matplotlib.rcParams['figure.dpi']= 200

Introduction: predicting the price of SHIB-USD

For this problem, we're going to focus on financial data. Before we begin, I would like to point out that LSTMs will not make you rich, even if they are excellent forecasters of time-series data. No model will make you rich.

We'll frame our problem as follows. We have historical price data for SHIB-USD, which includes the following predictors for each day (where we have daily time steps):

Opening price

High price

Low price

Volume traded

We aim to take some sequence of the above four values (say, for 100 previous days) and predict the target variable (SHIB-USD's price) for the next 50 days into the future.

Preprocessing and exploratory analysis

We begin by importing the data and quickly cleaning it. Fortunately, financial data is readily available online. We will use Yahoo historical prices for SHIB-USB, available back to August 8, 2020. This data is available here. Import the data using Pandas and have a look.

df = pd.read_csv('/datasets/data-analytics/Data_Analytics/SHIB-USD.csv', index_col = 'Date', parse_dates=True) df.drop(columns=['Adj Close'], inplace=True) df.head(5)

At the bare minimum, your exploratory data analysis should include plotting the target variable of interest. (Some people will argue that you should do much more than this, such as regressing the target variable on the predictors and looking for linear relationships between the variables.) Let's plot the SHIB-USD price over time to see what we're trying to predict.

plt.plot(df.Close) plt.xlabel("Time") plt.ylabel("Price (USD)") plt.title("SHIB-USD price over time") plt.savefig("initial_plot.png", dpi=250) plt.show();

Setting inputs and outputs

Recall that our predictors will consist of all the columns except our target closing price. Note that we want to use a 'sklearn' preprocessor below, which requires reshaping the array if it consists of a single feature, as our target does. Hence, for the target y, we have to call values, which removes the axes labels and will allow us to reshape the array.

X, y = df.drop(columns=['Close']), df.Close.values X.shape, y.shape

We now have the task of standardizing our features. We'll use standardization for our training features XX by removing the mean and scaling to unit variance. The standard score of a sample x is calculated as:

z = \frac{(x - u)}{s}z= s (x−u)

where uu is the mean of the training samples, and ss is the standard deviation of the training samples. Standardisation of a dataset is a common requirement for many machine learning estimators: they might behave badly if the individual features do not more or less look like standard normally distributed data (e.g. Gaussian with 0 mean and unit variance).

For our target yy, we will scale and translate each feature individually to between 0 and 1. This transformation is often used as an alternative to zero mean, unit variance scaling.

from sklearn.preprocessing import StandardScaler, MinMaxScaler mm = MinMaxScaler() ss = StandardScaler() X_trans = ss.fit_transform(X) y_trans = mm.fit_transform(y.reshape(-1, 1))

We want to feed 100 samples to the current day and predict the following 50 time-step values. To do this, we need a special function to ensure that the corresponding indices of X and y represent this structure. Examine this function carefully, but it essentially boils down to getting 100 samples from X, then looking at the 50 following indices in y, and patching these together. Note that because of this, we'll throw out the first 50 values of y.

# split a multivariate sequence into samples def split_sequences(input_sequences, output_sequence, n_steps_in, n_steps_out): X, y = list(), list() # instantiate X and y for i in range(len(input_sequences)): # find the end of the input, output sequence end_ix = i + n_steps_in out_end_ix = end_ix + n_steps_out - 1 # check if we are beyond the dataset if out_end_ix > len(input_sequences): break # gather input and output of the pattern seq_x, seq_y = input_sequences[i:end_ix], output_sequence[end_ix-1:out_end_ix, -1] X.append(seq_x), y.append(seq_y) return np.array(X), np.array(y)

X_ss, y_mm = split_sequences(X_trans, y_trans, 100, 50) print(X_ss.shape, y_mm.shape)

Let's check that the first sample in y_mm starts at the 100th sample in the original target y vector.

y_mm[0]

y_trans[99:149].squeeze(1)

assert y_mm[0].all() == y_trans[99:149].squeeze(1).all()

Above, we mentioned that we wanted to predict the data several months into the future. Thus, we'll use a training data size of 95%, with 5% left for the remaining data that we're going to predict. This gives us a training set size of 2763 days, or about seven and a half years. We will predict 145 days into the future, which is almost 5 months.

total_samples = len(X) train_test_cutoff = round(0.90 * total_samples) X_train = X_ss[:-150] X_test = X_ss[-150:] y_train = y_mm[:-150] y_test = y_mm[-150:]

print("Training Shape:", X_train.shape, y_train.shape) print("Testing Shape:", X_test.shape, y_test.shape)

import torch import torch.nn as nn from torch.autograd import Variable

X_train_tensors = Variable(torch.Tensor(X_train)) X_test_tensors = Variable(torch.Tensor(X_test)) y_train_tensors = Variable(torch.Tensor(y_train)) y_test_tensors = Variable(torch.Tensor(y_test))

# reshaping to rows, timestamps, features X_train_tensors_final = torch.reshape(X_train_tensors, (X_train_tensors.shape[0], 100, X_train_tensors.shape[2])) X_test_tensors_final = torch.reshape(X_test_tensors, (X_test_tensors.shape[0], 100, X_test_tensors.shape[2]))

print("Training Shape:", X_train_tensors_final.shape, y_train_tensors.shape) print("Testing Shape:", X_test_tensors_final.shape, y_test_tensors.shape)

One more thing we want to check: the data logic of the test set. Sequential data is hard to get your head around, especially when it comes to generating a test-set for multi-step output models. Here, we want to take the 100 previous predictors up to the current time-step, and predict 50 time-steps into the future. In the test set, we have 150 batch feature samples, each consisting of 100 time-steps and four feature predictors. In the targets for the test set, we again have 150 batch samples, each consisting of an array of length 50 of scalar outputs.

Since we want a way to validate our results, we need to predict the Bitcoin price for 50 time steps in the test set for which we have the data (i.e. the test targets). Because of the way we wrote split_sequence() above, we simply need the last sample of 100 days in X_test, run the model on it, and compare these predictions with the last sample of 50 days of y_test. These correspond to a period of 100 days in X_test's last sample, proceeded immediately by the next 50 days in the last sample of y_test.

X_check, y_check = split_sequences(X, y.reshape(-1, 1), 100, 50) X_check[-1][0:4]

X.iloc[-149:-145]

y_check[-1]

df.Close.values[-50:]

LSTM model

Now we need to construct the LSTM class, inheriting from nn.Module. In contrast to our previous univariate LSTM, we will build the model with the nn.LSTM rather than nn.LSTMCell. This is for two reasons: firstly, it's nice to be exposed to both so that we have the option. Secondly, we don't need the flexibility that nn.LSTMCell provides. We know that nn.LSTM is essentially just a recurrent application of nn.LSTMCell. Thus, we would only use nn.LSTMCell if we wanted to apply other transformations in between different LSTM layers, such as batch-normalization and dropout. However, we can implement dropout automatically using the dropout parameter in nn.LSTM. We've already standardized our data. Thus, there are not many reasons to use the more fiddly nn.LSTMCell.

As per usual, we'll present the entire model class first and then break it down the line by line.

class LSTM(nn.Module): def __init__(self, num_classes, input_size, hidden_size, num_layers): super().__init__() self.num_classes = num_classes # output size self.num_layers = num_layers # number of recurrent layers in the lstm self.input_size = input_size # input size self.hidden_size = hidden_size # neurons in each lstm layer # LSTM model self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers, batch_first=True, dropout=0.2) # lstm self.fc_1 = nn.Linear(hidden_size, 128) # fully connected self.fc_2 = nn.Linear(128, num_classes) # fully connected last layer self.relu = nn.ReLU() def forward(self,x): # hidden state h_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size)) # cell state c_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size)) # propagate input through LSTM output, (hn, cn) = self.lstm(x, (h_0, c_0)) # (input, hidden, and internal state) hn = hn.view(-1, self.hidden_size) # reshaping the data for Dense layer next out = self.relu(hn) out = self.fc_1(out) # first dense out = self.relu(out) # relu out = self.fc_2(out) # final output return out

Training

def training_loop(n_epochs, lstm, optimiser, loss_fn, X_train, y_train, X_test, y_test): for epoch in range(n_epochs): lstm.train() outputs = lstm.forward(X_train) # forward pass optimiser.zero_grad() # calculate the gradient, manually setting to 0 # obtain the loss function loss = loss_fn(outputs, y_train) loss.backward() # calculates the loss of the loss function optimiser.step() # improve from loss, i.e backprop # test loss lstm.eval() test_preds = lstm(X_test) test_loss = loss_fn(test_preds, y_test) if epoch % 100 == 0: print("Epoch: %d, train loss: %1.5f, test loss: %1.5f" % (epoch, loss.item(), test_loss.item()))

import warnings warnings.filterwarnings('ignore') n_epochs = 1000 # 1000 epochs learning_rate = 0.001 # 0.001 lr input_size = 4 # number of features hidden_size = 2 # number of features in hidden state num_layers = 1 # number of stacked lstm layers num_classes = 50 # number of output classes lstm = LSTM(num_classes, input_size, hidden_size, num_layers)

We use MSE as our loss function and the well-known Adam optimizer.

loss_fn = torch.nn.MSELoss() # mean-squared error for regression optimiser = torch.optim.Adam(lstm.parameters(), lr=learning_rate)

Let's train for 1000 epochs and see what happens. Recall in the previous article that a key part of LSTM debugging is visual cues. Here, our training is fast enough that we can just plot the result at the end, and if it's off, we can change our parameters and run it again.

training_loop(n_epochs=n_epochs, lstm=lstm, optimiser=optimiser, loss_fn=loss_fn, X_train=X_train_tensors_final, y_train=y_train_tensors, X_test=X_test_tensors_final, y_test=y_test_tensors)

Prediction

df_X_ss = ss.transform(df.drop(columns=['Close'])) # old transformers df_y_mm = mm.transform(df.Close.values.reshape(-1, 1)) # old transformers # split the sequence df_X_ss, df_y_mm = split_sequences(df_X_ss, df_y_mm, 100, 50) # converting to tensors df_X_ss = Variable(torch.Tensor(df_X_ss)) df_y_mm = Variable(torch.Tensor(df_y_mm)) # reshaping the dataset df_X_ss = torch.reshape(df_X_ss, (df_X_ss.shape[0], 100, df_X_ss.shape[2])) train_predict = lstm(df_X_ss) # forward pass data_predict = train_predict.data.numpy() # numpy conversion dataY_plot = df_y_mm.data.numpy() data_predict = mm.inverse_transform(data_predict) # reverse transformation dataY_plot = mm.inverse_transform(dataY_plot) true, preds = [], [] for i in range(len(dataY_plot)): true.append(dataY_plot[i][0]) for i in range(len(data_predict)): preds.append(data_predict[i][0]) plt.figure(figsize=(10,6)) #plotting plt.axvline(x=train_test_cutoff, c='r', linestyle='--') # size of the training set plt.plot(true, label='Actual Data') # actual plot plt.plot(preds, label='Predicted Data') # predicted plot plt.title('Time-Series Prediction') plt.legend() plt.savefig("whole_plot.png", dpi=300) plt.show()

test_predict = lstm(X_test_tensors_final[-1].unsqueeze(0)) # get the last sample test_predict = test_predict.detach().numpy() test_predict = mm.inverse_transform(test_predict) test_predict = test_predict[0].tolist() test_target = y_test_tensors[-1].detach().numpy() # last sample again test_target = mm.inverse_transform(test_target.reshape(1, -1)) test_target = test_target[0].tolist()

plt.plot(test_target, label="Actual Data") plt.plot(test_predict, label="LSTM Predictions") plt.savefig("small_plot.png", dpi=300) plt.show();

This is good. If we feed in the last 100 days of information, our model successfully predicts a steady decline in the price of SHIB-USD over the next 50 days. For one last plot, let's put this in perspective of the scale of the data.

plt.figure(figsize=(10,6)) #plotting a = [x for x in range(2500, len(y))] plt.plot(a, y[2500:], label='Actual data'); c = [x for x in range(len(y)-50, len(y))] plt.plot(c, test_predict, label='One-shot multi-step prediction (50 days)') plt.axvline(x=len(y)-50, c='r', linestyle='--') plt.legend() plt.savefig("final_plot.png", dpi=300) plt.show()

Conclusion

Interestingly, there's essentially no information on the internet on how to construct multi-step output LSTM models for multivariate time-series data. Hopefully, this article gave you the intuition and technical understanding for building your forecasting models. Just remember to think through your input and output shapes very carefully and construct tensors that represent past data predicting future data. Pytorch's LSTM class will take care of the rest so long as you know the shape of your data.

In terms of the next steps, I would recommend running this model on the most recent Bitcoin data from today, extending back to 100 days previously. See what the model thinks will happen to the price of Bitcoin over the next 50 days. You could also play with the time being fed to the model and the time being forecast; try for longer periods and see if the model can pick up on longer-term dependencies. Finally, you should note that these types of LSTMs are not the only solution to these multivariate, multi-output forecasting problems. There are many other deep learning solutions, including encoder-decoder networks for variable-length sequences, that you should look into. LSTMs are a great place to start and can give incredible performance if you know how to utilize them.