This is my first time building a deep learning model from scratch. Before this, I had not tried it, and I wasn’t sure where to start. But thanks to Jeremy Howard’s Practical Deep Learning for Coders course, I gave it a shot and built my first deep learning model- a digit classifier. His course also encouraged me to share what I learn with others.
In this blog post, I’ll share how I built a simple neural network using FastAI and PyTorch. I’ll walk through how I prepared the data, built the model, and trained it. I hope my experience helps others who are also trying to build their first model.
Loading the Training Dataset
Although FastAI comes with the MNIST dataset, it needs to be loaded and prepared for training.
The data is in the URLs
module in fastai
library, and it has the following directory structure:
├── training/
│ ├── 0/
│ │ ├── img1
│ │ └── img2
│ ├── 1/
│ │ ├── img2
│ │ └── img1
├── testing/
│ ├── 0/
│ │ ├── img1
│ │ └── img2
│ ├── 1/
│ │ ├── img1
│ │ └── img2
We can write a few lines of code to load these images as tensors:
= untar_data(URLs.MNIST)
path = path
= [ ]
x = [ ]
y for i in range(10):
= (path/'training'/str(i)).ls() # Get the image paths of a digit
# Load the images and convert to tensors
open(o)) for o in img_paths])
# Store the corresponding digit label for all images in this folder
* len(img_paths)) y.extend([tensor(i)]
Data Preprocessing
I now had a \(28 \times 28\) matrix for each image. To pass this as input to the model, I needed to flatten it into a single tensor, and standardize the pixel values for better results.
# Convert the python list into PyTorch tensor, and standardize it
= torch.stack(x).float()/255
x_tensor = torch.tensor(y)
# Each image is 28 * 28. So we flatten each image into a single flat tensor of size 784.
= x_tensor.view(-1, 28*28)
flattened_x = y_tensor.unsqueeze(1) # y_tensor has shape [60000], we unsqueeze it to make it [60000, 1] flattened_y
It’s always a good idea to split the dataset into training and validation sets:
from import RandomSplitter
= RandomSplitter(valid_pct=0.2, seed=42) # 80% training, 20% validation
= splitter(range(len(flattened_x))) # Splitter returns indices for the two sets
train_idxs, val_idxs
# Create the training and validation data from those indices
= flattened_x[train_idxs]
X_train = flattened_y[train_idxs]
= flattened_x[val_idxs]
X_val = flattened_y[val_idxs] y_val
As Jeremy Howard says, getting the data into your model is the trickiest and the most time-consuming part of building a deep learning model. Boy, it was tricky!
Now that I had the dataset ready, I trained the model.
Training the Model
I used a simple two-layer neural network to classify the digits. This architecture, while basic, was a good starting point for understanding how deep learning models work.
- Linear Layer
- ReLU activation Layer
- Linear Layer
- Softmax Output
Since the task is a classification task with multiple classes, I used cross entropy loss as the loss function to guide training. By using plain softmax as output function, I got NaN
values, so I used log softmax for numerical stability.
For this model, I had two sets of parameters:
- Weights and biases for layer 1, and
- Weights and biases for layer 2
It’s not a terrible idea to randomly initialize these weights. I defined the functions to implement the functionality for the model’s architecture:
- Initializing random weights
- Computing cross entropy loss
- Computing log softmax
- Matrix multiplication for the linear layers
def init_params(size, std=1.0):
"""Randomly initializes parameters of given size"""
return (torch.randn(size)*std).requires_grad_()
def linear(xb, weights, bias):
"""Does matrix multiplication with inputs and weights: y = mx + c"""
return xb @ weights + bias
def log_softmax(logits):
= logits.max(dim=1, keepdim=True).values
max_logits = logits - max_logits # For numerical stability
# Compute log-softmax
= stable_logits - torch.log(torch.exp(stable_logits).sum(dim=1, keepdim=True))
return log_probs
def cross_entropy_loss(log_probs, yb):
= yb.squeeze(1)
yb = log_probs.squeeze(1)
= log_probs[range(len(yb)), yb]
= -true_class_log_probs.mean()
loss return loss
Then, I used the DataLoaders
class in FastAI
, which passes the input tensors to the models in batches. I also initialized the model parameters.
# Randomly initialize weights and biases for both the layers
= init_params((28*28,128))
w1 = init_params(128)
b1 = init_params((128,10))
w2 = init_params(10)
= DataLoader(train_dataset, batch_size=256, shuffle=True)
train_dataloader = DataLoader(validation_dataset, batch_size=256, shuffle=True) validation_dataloader
Finally, the training phase:
= 25 # Train for 25 Epochs
epochs = 0.001
for i in range(epochs):
for xb, yb in train_dataloader:
# Reset gradients that have acccumulated in the last batch
if w1.grad is not None:
w1.grad.zero_()if b1.grad is not None:
b1.grad.zero_()if w2.grad is not None:
w2.grad.zero_()if b2.grad is not None:
# Forward pass
= torch.relu(linear(xb, w1, b1)) # 1st linear + ReLU
hidden = log_softmax(linear(hidden, w2, b2)) # 2nd Linear + softmax
# Compute loss for the current training batch
= cross_entropy_loss(probabilities, yb)
# Backpropagation: use PyTorch functionality to do backpropagation
# Update weights and biases
# Update weights and biases
with torch.no_grad():
-= learning_rate * w1.grad
w1 -= learning_rate * b1.grad
b1 -= learning_rate * w2.grad
w2 -= learning_rate * b2.grad
# Compute loss on the validation dataset with the same model
= 0
epoch_loss for xb, yb in validation_dataloader:
= torch.relu(linear(xb, w1, b1))
hidden = log_softmax(linear(hidden, w2, b2))
probabilities = cross_entropy_loss(probabilities, yb)
loss += loss.item()
= epoch_loss / len(validation_dataloader)
# Print the loss after each epoch
print(f'Epoch [{i+1}/{epochs}], Loss: {average_loss:.4f}')
Evaluating the Model
After training, it was time to test the predictions on the test dataset, the moment of truth!
I performed the same preprocessing steps on the test dataset, as on the training dataset. First, we load the data, then we convert it into a flattened tensor, which we then standardize.
= (path/'testing').ls()
= [ ]
x_test = [ ]
y_test for i in range(10):
= (path/'testing'/str(i)).ls() # Get the image paths in folder 'i'
# Load the images and convert to tensors
open(o)) for o in img_paths])
# Store the corresponding label `i` for all images in this folder
* len(img_paths))
= torch.stack(x_test).float()/255
x_test_tensor = torch.tensor(y_test)
# Each image is 28 * 28. So we flatten each image into a single flat tensor of size 784.
= x_test_tensor.view(-1, 28*28)
flattened_x_test = y_test_tensor.unsqueeze(1) flattened_y_test
I used the trained model to compute the predictions, and get the fraction of accurate predictions.
# Forward pass through the network
= torch.relu(linear(flattened_x_test, w1, b1))
hidden = log_softmax(linear(hidden, w2, b2))
= torch.argmax(probabilities, dim=1)
= (predicted_classes == y_test_tensor).sum().item()
# Calculate the number of incorrect predictions
= len(y_test_tensor) - correct_predictions
= correct_predictions / len(y_test_tensor)
accuracy print(f'Accuracy on test dataset: {accuracy * 100:.2f}%')
The model achieves about 87% accuracy! For my first model, not bad at all!
Final Thoughts
Building this model was much more helpful than simply watching tutorials. There’s a great room for improvement on this model, like increasing the complexity of the model, exploring errors, or trying a different architecture. The possibilities to experiment are endless, and I hope to continue experimenting!
If you’re reading this and thinking about building your own first model, it’s worth it! Happy coding! ✌️
P.S. Take a look at FastAI’s course: Practical Deep Learning for Coders. It’s truly one of a kind!