Project Overview
In this project, I tried to implement the full GPT-2 architecture from scratch using PyTorch. I pretrained the model using the FineWeb_Edu dataset. The model is an implementation and an improvement on the GPT-2 124M model.
I relied heavily on Andrej Karpathy’s tutorial and Mitesh Khapra Sir’s lectures to understand the internals of GPT-2 better.
Tech Stack
- PyTorch
GitHub Repository
For more details, the full source code can be found on GitHub: Source Code