The Shakespeare In Me

In this project, I implemented transformers architecture from scratch to create a decoder only GPT that sounds like Shakespeare.
Data Science
Author

Vikrant Mehta

Published

December 14, 2024

Project Overview

In this project, I tried to create a character level, decoder only GPT from scratch. I used the “Tiny Shakespeare” dataset to train the model, and implemented the entire transformer architecture including multi-headed attention blocks, residual connections and layer normalizations using PyTorch.

I relied heavily on Andrej Karpathy’s tutorial and Mitesh Khapra Sir’s lectures to understand the internals of transformers better.

I trained the model on Google Colab GPU for 40 mins, and after training the model for 5000 epochs, I got a pretty convincing Shakespeare-like output, which I have added in the more.txt file. The loss at this point was around 1.40, which produces a decent output.

Tech Stack

  1. PyTorch
  2. Python Jupyter Notebooks

GitHub Repository
For more details, the full source code can be found on GitHub: Source Code

And for a curious reading, the output could be found here: more.txt