John Z. Ma \ Deep Learning from Scratch

Deep Learning from Scratch

I learn best through implementation. To that end, I implemented PyTorch from scratch: github.com/johnma2006/candle. I use it almost exclusively for my deep learning experiments – definitely slower than using an actual framework with accelerator support but it's much funner this way.

Everything below implemented from scratch in pure numpy:

Tensor-based autograd
Layers: MHA / GQA / rotary / sparse attention with KV caching, batch / layer / RMS norm, conv2d
NLP: BPE, SentencePiece processor, LoRA fine-tuning, top-k / nucleus / beam search, speculative sampling, chat templates (Llama chat, ChatML), streaming chat UI
Models: Gemma (todo), Mixtral, Mamba, Llama, GPT, ResNet
Lightweight Tensorboard-like dashboarding

Language Modelling

Chat with Mixtral 8x7B:

(video sped up 30x)

Chat with Llama 13B:

Deep Learning from Scratch

Language Modelling

Optimization

Generalization

Vision

Safety