Reading
Papers I find interesting on ML, LLMs, and optimization
Transformers & BERT 4 papers
The foundational paper that started the transformer revolution—essential for understanding modern LLMs.
Changed how we think about language representation—bidirectional context was a game changer for NLP.
Solves the quadratic attention problem—crucial for processing long documents.
Fresh perspective on how BERT represents language geometrically.
Graph Neural Networks 5 papers
Comprehensive survey that ties together GCNs, GATs, and GraphSAGE—essential for relational data.
The paper that made GNNs practical.
Attention mechanism adapted for graphs.
Enables generalization to unseen nodes.
Connects GNNs to graph isomorphism testing.
LLM Optimization & Efficiency 5 papers
Train 10,000x fewer parameters while matching full fine-tuning.
Fine-tune 65B models on single GPU.
IO-aware algorithm that reduces HBM reads—2-4x speedup.
75% utilization on H100 GPUs with FP8.
Challenges the LoRA variant hype.
RAG & Knowledge Retrieval 3 papers
Comprehensive overview of RAG architectures.
Autonomous agents meet retrieval.
Uses graph structures as dynamic knowledge indices.