# Lab #5

In today's lab we'll extend what we've learned so far on RNNs:

  1. Learning our own word embeddings
  2. Improving performance by using `LSTM` and `GRU` layers
  3. Mitigating overfitting by adding recurrent dropout
  4. Improving performance by adding more layers
  5. Improving performance by using a `Bidirectional` layer


In [None]:
# Import needed packages
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import os
import tensorflow as tf
import keras
from tensorflow.keras import backend as K
from keras import datasets, layers, models, preprocessing
from keras.preprocessing import sequence
from keras.datasets import reuters
from keras.utils import to_categorical

Today we'll be working with the [Reuters dataset,](https://faroit.com/keras-docs/1.2.2/datasets/#reuters-newswire-topics-classification) a set of short newswires and their topics, published by Reuters in 1986 that is built-in to Keras and can be loaded similarly to the IMDb dataset. This dataset contains 11,228 newswires from Reuters, labeled over 46 topics. As with the IMDb dataset, each wire is encoded as a sequence of word indexes and each label is an integer number from 1 to 46.

Use the code below to load the data. Here we have chosen the number of words in our dictionary to be 10,000 (just like we did with the IMDb dataset) and the maximum length of a message to be 500 words. Any newswires shorter than 500 words will be padded with 0s and any newswires longer than 500 words will be truncated to the first 500 words. We use the `to_categorical` function to transform the y values from a single interger between 1 and 46 (indicating which of the 46 classes the message has been labeled as) to a vector of length 46 with a 1 in the position associated with the integer label and the rest 0s. 

In [None]:
# Load data
(x_train, y_train), (x_test, y_test) = reuters.load_data(num_words = 10000)

x_train = sequence.pad_sequences(x_train, maxlen = 500)
x_test = sequence.pad_sequences(x_test, maxlen = 500)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

# One hot encode labels
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/reuters.npz


  x_train, y_train = np.array(xs[:idx]), np.array(labels[:idx])
  x_test, y_test = np.array(xs[idx:]), np.array(labels[idx:])


x_train shape: (8982, 500)
x_test shape: (2246, 500)


## Question 1

Create an RNN with an `Embedding` layer, 2 `SimpleRNN` layers, and an output layer. Remember that you are performing multiclass classification with 46 total classes. You can choose the number of nodes for each layer and the length of the embedding vector.

In [None]:
model = keras.Sequential([

])

Compile and train the model for 10 `epochs` with a `batch_size` of 128 and a `validation_split` of 0.3 (use 30% of the training set for the validation set). 

In [None]:
# Your code here

Plot the loss and accuracy for the training and validation sets. 
Is there evidence of overfitting? 

In [None]:
import seaborn as sns

train_acc  = history.history['accuracy']
train_loss = history.history['loss']
val_acc  = history.history['val_accuracy']
val_loss = history.history['val_loss']

epochs = range(1, len(train_loss) + 1)

plt.plot(epochs, train_loss, label = 'Training Loss')
plt.plot(epochs, val_loss, label = 'Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

In [None]:
plt.plot(epochs, train_acc, label = 'Training Accuracy')
plt.plot(epochs, val_acc, label = 'Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

## Question 2

Now fit the same model as above but with 2 `LSTM` layers instead of `SimpleRNN` layers.

In [None]:
model = keras.Sequential([
# Your code here
])

# Your code here

Plot the loss and accuracy for the training and validation sets. 
Is there evidence of overfitting? Has the performance of the model improved?

In [None]:
train_acc  = history.history['accuracy']
train_loss = history.history['loss']
val_acc  = history.history['val_accuracy']
val_loss = history.history['val_loss']

epochs = range(1, len(train_loss) + 1)

plt.plot(epochs, train_loss, label = 'Training Loss')
plt.plot(epochs, val_loss, label = 'Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

In [None]:
plt.plot(epochs, train_acc, label = 'Training Accuracy')
plt.plot(epochs, val_acc, label = 'Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

## Question 3

Now fit the same model as above but with 2 `GRU` layers rather than 2 `LSTM` layers.

In [None]:
model = keras.Sequential([
# Your code here
])

# Your code here

Plot the loss and accuracy for the training and validation sets. 
Is there evidence of overfitting? Has the performance of the model improved?


In [None]:
train_acc  = history.history['accuracy']
train_loss = history.history['loss']
val_acc  = history.history['val_accuracy']
val_loss = history.history['val_loss']

epochs = range(1, len(train_loss) + 1)

plt.plot(epochs, train_loss, label = 'Training Loss')
plt.plot(epochs, val_loss, label = 'Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

In [None]:
plt.plot(epochs, train_acc, label = 'Training Accuracy')
plt.plot(epochs, val_acc, label = 'Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

## Question 4

Using the same model as in question 3, try adding `dropout` and `recurrent_dropout` arguments to the first `GRU` layer. Chose whatever level of dropout you prefer for each type of dropout. 

In [None]:
model = keras.Sequential([
# Your code here
])

# Your code here

Plot the loss and accuracy for the training and validation sets. 
Is there evidence of overfitting? Has the performance of the model improved?


In [None]:
train_acc  = history.history['accuracy']
train_loss = history.history['loss']
val_acc  = history.history['val_accuracy']
val_loss = history.history['val_loss']

epochs = range(1, len(train_loss) + 1)

plt.plot(epochs, train_loss, label = 'Training Loss')
plt.plot(epochs, val_loss, label = 'Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

In [None]:
plt.plot(epochs, train_acc, label = 'Training Accuracy')
plt.plot(epochs, val_acc, label = 'Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

## Question 5

Let's try to improve accuracy by increasing the complexity of the model. Add a third `GRU` layer to your model from Question 4 and train the model.

In [None]:
model = keras.Sequential([
# Your code here
])

# Your code here

Plot the loss and accuracy for the training and validation sets. 
Is there evidence of overfitting? Has the performance of the model improved?
 

In [None]:
train_acc  = history.history['accuracy']
train_loss = history.history['loss']
val_acc  = history.history['val_accuracy']
val_loss = history.history['val_loss']

epochs = range(1, len(train_loss) + 1)

plt.plot(epochs, train_loss, label = 'Training Loss')
plt.plot(epochs, val_loss, label = 'Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

In [None]:
plt.plot(epochs, train_acc, label = 'Training Accuracy')
plt.plot(epochs, val_acc, label = 'Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

## Question 6

Let's see if using a bidirectional RNN will increase performance. Create a model similar to your `LSTM` model in Question 2, but with 1 bidirectional LSTM later. 

In [None]:
model = keras.Sequential([
# Your code here
])

# Your code here

Plot the loss and accuracy for the training and validation sets. 
Is there evidence of overfitting? Has the performance of the model improved?


In [None]:
train_acc  = history.history['accuracy']
train_loss = history.history['loss']
val_acc  = history.history['val_accuracy']
val_loss = history.history['val_loss']

epochs = range(1, len(train_loss) + 1)

plt.plot(epochs, train_loss, label = 'Training Loss')
plt.plot(epochs, val_loss, label = 'Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

In [None]:
plt.plot(epochs, train_acc, label = 'Training Accuracy')
plt.plot(epochs, val_acc, label = 'Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

## Question 7

Which model would you choose for this task? Why did you choose that model?
