My adventure in Deep Learning — part 3

Augusto Gonzalez-Bonorino
11 min readJun 17, 2021

If you made it this far, congratulations! Hopefully, I was able to make the theory part interesting and entertaining. Today we will start looking at some code, yeah I know finally, and some basic implementations of Deep Learning models using Pytorch. This blog post will be organized in the following way:

  1. We will download a dataset and get it ready to feed it into our model.
  2. Then, we will download and train a model to perform text classification.

In respect to data gathering, at the end of this post, I will provide some advice on how to find the data that best fits your needs and how to leverage the Dataloaders class from fast.ai to simplify this process.

Disclaimer: All of this code has been extracted from a PyTorch tutorial.

Let’s get started…

The first thing you should always keep in mind is the following: Do no be a perfectionist! It will only waste your time during development. Do not spend months trying to find the perfect dataset for you model, or trying to develop the best looking interface for your application. Find good enough data and start training and iterating, specially during your learning (or training may I say to honor our deep learning jargon) process. Practice makes perfect, and as I said on part 1 of this series I will be putting emphasis on gathering the experience necessary for y’all to develop models on your own.

Today we will be working with text data and leveraging techniques from Natural Language Processing (NLP) to attempt to create a name classifier that given a name or a word will predict its root, meaning where the name is originally from.

Note: For those who do not know, NLP is an interdisciplinary branch of AI and Deep Learning that combines techniques and practices from linguistics and computer science to extract knowledge from natural language, which is the language humans speak and write.

First, download the data from here and upload to to your directory. I recommend loading it into a google colab, specially if you are newbies like me. Make sure to give it appropriate names to your folders, or to adapt the code accordingly. The data we have downloaded contains a subfolder called ‘names’ that contains a bunch of text files with names in different languages. We will use that to train a Recurrent Neural Network (RNN) to make predictions.

If you are not familiar with RNNs here is a very insightful article that introduces them and provides multiple examples on how to use them.

from __future__ import unicode_literals, print_function, division
from io import open
import glob
import os
import unicodedata
import string

def
findFiles(path):
return glob.glob(path)

print(findFiles('data/names/*.txt'))


all_letters = string.ascii_letters + " .,;'"
n_letters = len(all_letters)

# Turn a Unicode string to plain ASCII, thanks to https://stackoverflow.com/a/518232/2809427
def unicodeToAscii(s):
return ''.join(
c
for c in unicodedata.normalize('NFD', s)
if unicodedata.category(c) != 'Mn'
and c in all_letters
)

print(unicodeToAscii('Ślusàrski'))

The first chunk of code focuses on processing our data to get it into the right format, so we can then feed it to our RNN. First, we extract all the files from our folder. Second, we encode it to ASCII.

ASCII, stands for American Standard Code for Information Interchange. It’s a 7-bit character code where every single bit represents a unique character.

# Build the category_lines dictionary, a list of names per language
category_lines = {}
all_categories = []
# Read a file and split into lines
def readLines(filename):
lines = open(filename, encoding='utf8').read().strip().split('\n') return [unicodeToAscii(line) for line in lines]

for filename in findFiles('data/names/*.txt'):
category = os.path.splitext(os.path.basename(filename))[0]
all_categories.append(category)
lines = readLines(filename)
category_lines[category] = lines

n_categories = len(all_categories)

After downloading the data from the files we must create a dictionary that will have the form of {‘language’: [names…]}. For this we create the helper function readLines that takes as input the name of a file and returns each line in that fille encoded to ASCII. Then, to create the dictionary, we iterate over each file to extract the category (which is just the name of the text file) and utilize our helper function to encode the names in every file and append those two variable to our dictionary, where each category will map to a list of names.

Feel free to experiment with how the data looks at the moment, here I am printing the first ten names in Spanish:

print(category_lines['Spanish'][:10])

Awesome! Now we have our data organized… But, neural network have no clue what a name is or how to interpret text. In order to do this, we must convert our text (or any categorical data in that regard) to numbers (vectors to be more precise). In Deep Learning these vectors get the names of Tensors, which are basically a mathematical structure used to represent vectors and matrices. For example, a single-element tensor is a scalar, a one-dimensional tensor a vector and multi-dimensional tensors are matrices. Tensors are very similar to Numpy’s ndarrays with the only difference that they can run on GPUs, which significantly improves runtime.

import torch# Find letter index from all_letters, e.g. "a" = 0
def letterToIndex(letter):
return all_letters.find(letter)
# Turn a line into a <line_length x 1 x n_letters>,
# or an array of one-hot letter vectors
def lineToTensor(line):
tensor = torch.zeros(len(line), 1, n_letters)
for li, letter in enumerate(line):
tensor[li][0][letterToIndex(letter)] = 1
return tensor
print(lineToTensor('Jones').size())

This technique, of encoding words into vectors, is called one-hot encoding. Amazing! Now we have everything we need to be able to start building our first Recurrent Neural Network. We can do this very easily by leveraging the torch.nn API as follows:

import torch.nn as nnclass RNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(RNN, self).__init__()
self.hidden_size = hidden_size self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
self.i2o = nn.Linear(input_size + hidden_size, output_size)
self.softmax = nn.LogSoftmax(dim=1)
def forward(self, input, hidden):
combined = torch.cat((input, hidden), 1)
hidden = self.i2h(combined)
output = self.i2o(combined)
output = self.softmax(output)
return output, hidden
def initHidden(self):
return torch.zeros(1, self.hidden_size)
n_hidden = 128
rnn = RNN(n_letters, n_hidden, n_categories)

In this post, we will not be diving deep into the details of each part that makes a neural network because, as I explained in my first blog, I am following a top-down approach to teaching these concepts. So, for now just understand that every neural network class inherits from the nn.Module parent class and is built by adding layers to it. Each layer has an activation function, which is basically a mathematical function that makes some calculations and outputs a prediction that will later be assessed by a loss function.

Note: If you are unfamiliar with some of these terms please check out part 2 of this series where I write about the main concepts and jargon needed to work in Machine or Deep Learning.

The __init__ function initializes the parameters and functions needed to build our model, which are then passed to the forward function that makes the calculations and outputs a prediction (which is a probability of that word pertaining to a given language, normalized with a logSoftmax function). Moreover, the initHidden initializes the hidden state of our neural network (first initialized to zero with the dimension specified by our n_hidden hyperparameter) which plays the role of saving the outputs of forward to use them on our next iteration, that way we are always updating the parameters of our model based on the predictions made on the previous iteration. This is the core idea of how machines “learn”, we want to improve our predictions with each iteration, if the model always looks at the same input without updating any parameters it will always output the same probabilities. Let’s get our first output:

input = lineToTensor('Augusto')
hidden = torch.zeros(1, n_hidden)
output, next_hidden = rnn(input[0], hidden)
print(output)

I encoded my name and passed it to the RNN we just created. This will output a Tensor of 1 x n_categories where the numbers you see correspond to the likelihood of that name pertaining to each category, the larger the number the more likely it is to pertain to that category.

Great! One big little detail before training our RNN is to be able to get the index of the highest probability and to have a way to automatically select random inputs, meaning a name and its corresponding language. To do this we will create two helper functions:

def categoryFromOutput(output):
top_n, top_i = output.topk(1)
category_i = top_i[0].item()
return all_categories[category_i], category_i
print(categoryFromOutput(output))

The first one will take the output of the neural network (my name in this case) and return the category it was determined to have the highest probability of being correct.

import randomdef randomChoice(choice):    return choice[random.randint(0, len(choice) - 1)]def randomTrainingExample():    category = randomChoice(all_categories)    line = randomChoice(category_lines[category])    category_tensor = torch.tensor([all_categories.index(category)], dtype=torch.long)    line_tensor = lineToTensor(line)    return category, line, category_tensor, line_tensorfor i in range(10):
category, line, category_tensor, line_tensor = randomTrainingExample()
print('category =', category, '/ line =', line)

Here randomChoice receives as input our list containing all the categories (list we created at the beginning of our code) and selects a random one. Next, randomTrainingExample uses our first helper function to select a random category and a random name of that category from our category_lines dictionary. Finally, we transform those choices to Tensors and, for demonstration, print out the first ten examples. Try it out, every time you run this chunk of code you will get 10 different random selections, which is exactly what we are looking for.

One last step before training our RNN is to specify our loss function (remember I mentioned this earlier?) in order to tell our model how bad its predictions are. The appropriate loss function is tied to the the function we chose for our last layer of our neural network, which is a logSoftMax in our case. The appropriate loss function for our case is a Nllloss which stands for ‘Negative log likelihood loss’. Again, do not worry too much about what these are how they work for now but if you are like me a just cannot control your curiosity I have provided links to the documentation of every function. Okay, we specify this by creating the following variable:

lossFunction = nn.NLLLoss()

Each loop of training will:

  • Create input and target tensors
  • Create an initial hidden state full of zeroes
  • Read each letter in and Keep the hidden state for next letter
  • Compare final output to target
  • Back-propagate
  • Return the output and loss of that iteration

Note: The aforementioned bullet points have been extracted from PyTorch official documentation

# If you set this too high, it might explode. If too low, it might not learn
learning_rate
= 0.005
def train(category_tensor, line_tensor): # Keeps track of hidden layer state
hidden = rnn.initHidden()
rnn.zero_grad() for i in range(line_tensor.size()[0]):
output, hidden = rnn(line_tensor[i], hidden)
loss = lossFunction(output, category_tensor)
loss.backward()
# Add parameters' gradients to their values, multiplied by learning rate
for p in rnn.parameters():
p.data.add_(p.grad.data, alpha=-learning_rate)
return output, loss.item()

With our training function defined the only thing left to do is to pass it a bunch of examples and it is pretty much done. To do this it is good practice to keep track of two main things: the time each iteration takes, and how the loss evolves. Before proceeding I want to make a brief comment on the add_ function. This is an example of an in-place operation, this means that the result is stored in the operand. They are suffixed by a _ at the end of it. They are good to save some memory but they are a bit problematic when computing derivatives, therefore their use is discouraged when developing models meant to be used in production. Nevertheless, for educational purpose they are a good resource as it reduces the computational power required for training. Okay, with that out of the way we can continue to train our RNN.

First, we define a function to keep track of time as follows:

import time
import math
n_iters = 100000
print_every = 5000
plot_every = 1000
# Keep track of losses for plotting
current_loss = 0
all_losses = []
def timeSince(since):
now = time.time()
seconds = now - since
minutes = math.floor(s / 60)
seconds -= m * 60
return '%dminutes %dseconds' % (minutes, seconds)

Second, we iterate over multiple example n_iters times and print the output every print_every times.

start = time.time()for iter in range(1, n_iters + 1):
category, line, category_tensor, line_tensor = randomTrainingExample()
output, loss = train(category_tensor, line_tensor)
current_loss += loss
# Print iter number, loss, name and guess
if iter % print_every == 0:
guess, guess_i = categoryFromOutput(output)
correct = '✓' if guess == category else '✗ (%s)' % category
print('%d %d%% (%s) %.4f %s / %s %s' % (iter, iter / n_iters
* 100, timeSince(start), loss, line, guess, correct))
# Add current loss avg to list of losses
if iter % plot_every == 0:
all_losses.append(current_loss / plot_every)
current_loss = 0

To evaluate the evolution of our model’s performance we can use the visualization library matplotlib to plot our results:

import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
plt.figure()
plt.plot(all_losses)

And to evaluate how well it works on different categories we can code a confusion matrix as follows:

Note: A confusion matrix is a summarized table of the number of correct and incorrect predictions (or actual and predicted values) yielded by a classifier (or classification model) for binary classification tasks. It is a way to evaluate the performance of machine and deep learning models.

# Keep track of correct guesses in a confusion matrix
confusion = torch.zeros(n_categories, n_categories)
n_confusion = 10000
# Just return an output given a line
def evaluate(line_tensor):
hidden = rnn.initHidden()
for i in range(line_tensor.size()[0]):
output, hidden = rnn(line_tensor[i], hidden)
return output# Go through a bunch of examples and record which are correctly guessed
for i in range(n_confusion):
category, line, category_tensor, line_tensor = randomTrainingExample()
output = evaluate(line_tensor)
guess, guess_i = categoryFromOutput(output)
category_i = all_categories.index(category)
confusion[category_i][guess_i] += 1
# Normalize by dividing every row by its sum
for i in range(n_categories):
confusion[i] = confusion[i] / confusion[i].sum()
# Set up plot
fig = plt.figure()
ax = fig.add_subplot(111)
cax = ax.matshow(confusion.numpy())
fig.colorbar(cax)
# Set up axes
ax.set_xticklabels([''] + all_categories, rotation=90)
ax.set_yticklabels([''] + all_categories)
# Force label at every tick
ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
ax.yaxis.set_major_locator(ticker.MultipleLocator(1))
# sphinx_gallery_thumbnail_number = 2
plt.show()

Note the use of the function evaluate() instead of train(), they are basically the same with the only exception that evaluate() does not perform backpropagation (because we don’t want to update the parameters at this point, we just want to see how well it performs as it is).

Although your results might be a little different because we are using random examples, it will output something like this:

Here, we are interested in those squares off the main diagonal that are very bright. The brighter they are the more incorrect predictions our model had. For example, it seems that it is not very good for Chinese to Korean or English to Scottish. Overall, the language with the worst results is English. To understand the reasons why requires a bit of research on linguistics and the overall structure of each language, but I hope you get the idea :)

To conclude, let’s write one last function to be able to make prediction on user’s input:

def predict(input_line, n_predictions=3):
print('\n> %s' % input_line)
with torch.no_grad():
output = evaluate(lineToTensor(input_line))

# Get top N categories
topv, topi = output.topk(n_predictions, 1, True)
predictions = []

for i in range(n_predictions):
value = topv[0][i].item()
category_index = topi[0][i].item()
print('(%.2f) %s' % (value,
all_categories[category_index]))
predictions.append([value,
all_categories[category_index]])

predict('Augusto')
predict('Jackson')
predict('Gustavo')
predict('
Ellie')
predict(
'Chang')

Remember that we are looking for the larger probability so the closer the probability to 0 the better.

That concludes this post, I tried to make it as comprehensive and detailed as possible. I hope you enjoyed it and definitely let me know if you have any comments or suggestions. This is a learning process for me as well!

Here is the full notebook if you want to experiment with it and run it along with this tutorial:

--

--

Augusto Gonzalez-Bonorino

2nd year PhD Economics at Claremont Grad Univ. From Argentina. I created the Entangled Mind blog. Check it out ;) https://agbonorino.medium.com/membership