GPT-2 – Vikrant

Project Overview

In this project, I tried to implement the full GPT-2 architecture from scratch using PyTorch. I pretrained the model using the FineWeb_Edu dataset. The model is an implementation and an improvement on the GPT-2 124M model.

I relied heavily on Andrej Karpathy’s tutorial and Mitesh Khapra Sir’s lectures to understand the internals of GPT-2 better.

Tech Stack

PyTorch

GitHub Repository
For more details, the full source code can be found on GitHub: Source Code