Reading

Papers I find interesting on ML, LLMs, and optimization

Transformers & BERT 4 papers
Attention Is All You Need Vaswani et al. (2017)

The foundational paper that started the transformer revolution—essential for understanding modern LLMs.

Changed how we think about language representation—bidirectional context was a game changer for NLP.

Solves the quadratic attention problem—crucial for processing long documents.

Fresh perspective on how BERT represents language geometrically.

Graph Neural Networks 5 papers

Comprehensive survey that ties together GCNs, GATs, and GraphSAGE—essential for relational data.

The paper that made GNNs practical.

Graph Attention Networks Veličković et al. (2017)

Attention mechanism adapted for graphs.

Enables generalization to unseen nodes.

Connects GNNs to graph isomorphism testing.

LLM Optimization & Efficiency 5 papers

Train 10,000x fewer parameters while matching full fine-tuning.

Fine-tune 65B models on single GPU.

IO-aware algorithm that reduces HBM reads—2-4x speedup.

75% utilization on H100 GPUs with FP8.

Challenges the LoRA variant hype.

RAG & Knowledge Retrieval 3 papers

Comprehensive overview of RAG architectures.

Autonomous agents meet retrieval.

Uses graph structures as dynamic knowledge indices.