# Multi-Layer Perceptron in `keras`

In this series of lab sessions, you will use a Python library called `keras`

(that is in fact embedded inside a larger library called `tensorflow`

, but we will not discuss `tensorflow`

in this course).
You should visit `keras`

webpage to get access to more information about this library, including a comprehensive documentation.

## The `Sequential`

model in `keras`

This library offers two ways to define neural network models.
We will start with the `Sequential`

class of `keras`

models.
Below is an example of how to define a `Sequential`

model:

**1. Define layers, and add them one by one to the model**

**2. Pick an optimization algorithm (optimizer) and a loss function to be optimized**

Usual loss functions are:

`"mse"`

for regression,`"categorical_crossentropy"`

for multiclass classification (when the`y`

array fed to`fit`

is of shape $(n, n_\text{classes})$)`"binary_crossentropy"`

for binary classification (when the model is fed with`y`

array of shape $(n, 1)$)

One can also specify additional metrics to be printed during training (correct classification rate here).

**3. Fit the model**

NB: do not try to execute the following line of code: variables `X_train`

and `y_train`

do not exist yet!

## Data pre-processing

Have a look at the `prepare_mnist`

and `prepare_boston`

functions defined below.

**Question #1.** What do these functions do? What are the shapes of returned arrays? Does the returned data correpond to classification or regression problems?

## Building your first models

In the following, when fitting models, restrict the training to 10 epochs (which is not realistic, but training for more epochs takes time...)

**Question #2.** Following the guidelines provided above, implement a linear regression model for the `boston`

dataset that would optimize on a least squares objective using Stochastic Gradient Descent and fit your model to the corresponding training data.

**Question #3.** Similarly, define a logistic regression model for the `mnist`

dataset and print its training accuracy during training.

**Question #4.** Compare performance (in terms of training accuracy, we will come back to better ways to compare models afterwards) of this logistic regression model with that of a neural network with respectively 1, 2, and 3 hidden layers of 128 neurons each.
You will use the `"relu"`

activation function for hidden layers.

**Question #5.** `keras`

models offer a `count_params()`

method to get the number of parameters to be learned in the model. Use this facility to get the number of parameters of your 3-hidden-layer model and build a new one-hidden-layer model with an equivalent number of parameters. Compare performance of these two models with similar number of parameters.

## A better way to compare models

Comparing models based on training accuracy (resp. loss) is a "great" way to overfit your model to the training data. A better way to compare models is to use hold out data (aka validation set).

To do so, `keras`

allows to pass, at `fit`

time, a fraction of the training data to be used as validation set. Have a look there for more details about how validation samples are selected.

**Question #6.** Repeat model comparisons above (relying on validation scores) using 30% of training data as validation set.

## Optimizers and learning rate

**Question #7.** Change the optimizer used for your model. Use an optimizer with momentum and adaptive learning rate.

**Question #8.** Using the docs, vary the learning rate of your optimizer from a very low value to a much larger one so as to show evidence of:

- instability when the learning rate is too large;
- slow convergence when the learning rate is too low.

## Callbacks

Callbacks are tools that, in `keras`

, allow one to intervene during the training process of a model.
Callbacks can be used to take actions (*ie.* save intermediate model, stop optimization if overfitting occurs, *etc.*).

A first callback one can play with is the one returned by any call to `fit`

on a `keras`

model.
This callback is an object with an `.history`

attribute in the form of a Python dictionnary whose keys are the metrics recorded during training. Each of these keys links to an array containing the consecutive values of the considered quantity (one value per epoch).

**Question #9.** Plot correct classification rates on both training and validation sets.

Setting up other callbacks must be explicit. This is done by passing a list of callbacks to the `fit`

method.

When training a model is long, one can wish to record intermediate models (in case of a crash during training, or just for cases when intermediate models were performing better than the final one).
The `ModelCheckpoint`

callback is designed for that purpose.

**Question #10.** Set up recording of intermediate models every epoch. Save the models into a dedicated folder on your Deepnote project. Only record models if validation loss is lower than for all previous models.

## Regularization

**Question #11.** Add an $\ell_2$ regularization to the weights of your model and show its impact on overfitting. These docs could help.

**Question #12.** Instead of the $\ell_2$ regularization, set up a drop-out strategy and assess its impact on overfitting (you will turn off 10% of the neurons at each training batch).

**Question #13.** Set up an `EarlyStopping`

strategy such that training the model will stop in case the validation loss does not decrease for 5 consecutive epochs.

## Hyper-parameter selection in `keras`

In this lab session, you will use a Python library called `keras-tuner`

that
aims at providing hyper-parameter selection tools for `keras`

models.
More specifically, you will be using the HyperBand algorithm to select appropriate hyper-parameters for your models.

### Pre-requisite

The first thing you need to do is to:

a. load your data;

b. define a function that takes an argument named `hp`

(we will come back to that later) and returns a compiled `keras`

model.

But, even before that, let us install the `keras-tuner`

package:

Have a look at the code below:

Note that in the code above, a subset of the MNIST dataset is selected in order to fasten the training process (ie. to fit in the lab session time slot...), but you should never do that in practice, of course!

## Specifying hyper-parameters to be set

**Question #13.** Using the example provided in the `keras-tuner`

docs, modify the code in the previous notebook cell such that the number of units in each layer can be picked by the algorithm from the set {64, 128, 192}.

**Now re-execute the cell above for your changes to be taken into account!**

Now, we can define a so-called `tuner`

that will perform the hyper-parameter search using HyperBand:

## Launch Hyperband

**Question #14.** Next step is to start the search for good hyper-parameter values using our Hyperband instance. To do so, this instance has a `search`

method that can be called just as you would call `fit`

on a `keras`

model.

**Question #15.** Given the output available above, what is the best hyper-parameter set found during the search?

To check how the search went, one can inspect the results of the search:

And, finally, best models can be extracted using `tuner.get_best_models(num_models)`

.

**Question #16.** Print a summary (containing the number of units per layer) of the best model selected by the tuner.