OK, thus far we have been talking about linear models. All these can be viewed as a single-layer neural net. The next step is to move on to multi-layer nets. Training these is a bit more involved, and implementing from scratch requires time and effort. Instead, we just use well-established libraries. I prefer PyTorch, which is based on an earlier library called Torch (designed for training neural nets via backprop).
Use CUDA if it is available, otherwise use CPU.
Torch handles data types a bit differently. Everything in torch is a tensor.
The idea in Torch is that tensors allow for easy forward (function evaluations) and backward (gradient) passes.
Notice how the backward pass computed the gradients using autograd. OK, enough background. Time to train some networks. Let us load the Fashion MNIST dataset, which is a database of grayscale images of clothing items.
Let us examine the size of the dataset.
The dataset contains images of size (28, 28). After flattening, the data points are 784.
Let us try to visualize some of the images. Since each data point is a tensor (not an array) we need to postprocess to use matplotlib.
Let's try plotting several images. This is conveniently achieved in PyTorch using a data loader, which loads data in batches.
Now we are ready to define our linear model. Here is some boilerplate PyTorch code that implements the forward model for a single layer network for logistic regression (similar to the one discussed in class notes).
Cool! Now we have set everything up. Let's try to train the network.
This block trains the neural network for 20 epochs and prints the training loss for every 5 epochs.
This block runs on test data for 20 epochs and prints the loss for every 5 epochs.
Neat! Now let's evaluate our model accuracy on the entire dataset. The predicted class label for a given input image can computed by looking at the output of the neural network model and computing the index corresponding to the maximum activation. Something like
predicted_output = net(images) _, predicted_labels = torch.max(predicted_output,1)