Project Overview
I enjoy building things from scratch. In this project, I tried to build the Wavenet like architecture for a language modeling task from scratch. For the most part, I tried to follow Andrej Karpathy’s Tutorial on YouTube, along with certain additional notes that I took along the way to understand PyTorch’s broadcasting and batched matrix multiplication.
In this Jupyter notebook, I trained a character level language model that predicts the next character based on a context of eight characters. I implemented the neural network modules from scratch, without using PyTorch, except for tensor operations. In this architecture, the flatten layers hierarchically fuse character embeddings along the layers and achieve a decent loss.
The dataset used for the project is a names dataset, which was what was used in the YouTube tutorial as well.
Tech Stack Used
- Python Jupyter Notebooks
- PyTorch
GitHub Repository
For more details, the full source code can be found on GitHub: Source Code