Deep Learning from Scratch
I learn best through implementation. To that end, I implemented PyTorch
from scratch: github.com/johnma2006/candle.
I use it almost exclusively for my deep learning experiments – definitely slower
than using an actual framework with accelerator support but it's much funner this way.
Everything below implemented from scratch in pure numpy:
- Tensor-based autograd
- Layers: MHA / GQA / rotary / sparse attention with KV caching, batch / layer / RMS norm, conv2d
- NLP: BPE, SentencePiece processor, LoRA fine-tuning, top-k / nucleus / beam search, speculative sampling, chat templates (Llama chat, ChatML), streaming chat UI
- Models: Gemma (todo), Mixtral, Mamba, Llama, GPT, ResNet
- Lightweight Tensorboard-like dashboarding
Language Modelling

(video sped up 30x)

Optimization


Generalization

Vision
