
Deep Learning from Scratch
FC · RNN/LSTM · U-Net · GPT, from scratch
Graduate DL coursework with trained weights bundled; source private.
Graduate deep-learning coursework rebuilt as one portfolio piece spanning four architectures, each written from scratch rather than imported. HW1 implements fully-connected layers as a custom torch.autograd.Function with manually derived forward/backward passes (verified analytic gradients), solving XOR and Iris. HW4 hand-codes a recurrent cell and an LSTM cell (manual input/forget/output/cell gates) for character-level generation: a 66K-param RNN and an 873K-param LSTM at hidden=256/2 layers. HW3 is a 36-class semantic segmenter: a 17-layer fully-convolutional baseline versus an 89-layer ResNet-18 encoder + U-Net decoder with skip connections, class-weighted label-smoothed loss, full D8 augmentation, TTA and a 600-epoch cosine schedule. HW5 builds a decoder-only GPT transformer (multi-head self-attention, positional embeddings) trained on WikiText-2 with the GPT-2 BPE tokenizer (50,257-token vocab); the improved 30.5M-param model (d_model 256, 8 heads, 6 layers) cuts test perplexity from 269.4 (13.7M base) to 178.5.
- Python
- PyTorch
- Transformers
- GPT-2 BPE
- WikiText-2
- ResNet
- U-Net
- LSTM
- Slurm / HPC
- GPT test perplexity
- 269.4 → 178.5
- Improved GPT params
- 30.5M
- From-scratch LSTM
- 873K params
- Segmentation
- 36 classes · 89-layer U-Net