The Shakespeare In Me

In this project, I implemented GPT-2 architecture from scratch.
Data Science
Author

Vikrant Mehta

Published

January 17, 2025

Project Overview

In this project, I tried to implement the GPT-2 architecture from scratch using PyTorch. I trained the model using the FineWeb_Edu dataset, and evaluated using the Hellaswag dataset.

For training, I rented a few Lambda Labs GPUs.

I relied heavily on Andrej Karpathy’s tutorial and Mitesh Khapra Sir’s lectures to understand the internals of GPT-2 better.

Tech Stack

  1. PyTorch

GitHub Repository
For more details, the full source code can be found on GitHub: Source Code