Deep Learning from Scratch

I learn best through implementation. To that end, I implemented PyTorch from scratch: github.com/johnma2006/candle. I use it almost exclusively for my deep learning experiments – definitely slower than using an actual framework with accelerator support but it's much funner this way.

Everything below implemented from scratch in pure numpy:

  • Tensor-based autograd
  • Layers: MHA / GQA / rotary / sparse attention with KV caching, batch / layer / RMS norm, conv2d
  • NLP: BPE, SentencePiece processor, LoRA fine-tuning, top-k / nucleus / beam search, speculative sampling, chat templates (Llama chat, ChatML), streaming chat UI
  • Models: Gemma (todo), Mixtral, Mamba, Llama, GPT, ResNet
  • Lightweight Tensorboard-like dashboarding


Language Modelling

(video sped up 30x)



Optimization




Generalization



Vision



Safety