Project Overview
In this project, I tried to implement the GPT-2 architecture from scratch using PyTorch. I trained the model using the FineWeb_Edu dataset, and evaluated using the Hellaswag dataset.
For training, I rented a few Lambda Labs GPUs.
I relied heavily on Andrej Karpathy’s tutorial and Mitesh Khapra Sir’s lectures to understand the internals of GPT-2 better.
Tech Stack
- PyTorch
GitHub Repository
For more details, the full source code can be found on GitHub: Source Code