Reading - Bahushruth CS

Transformers & BERT 4 papers

Attention Is All You Need Vaswani et al. (2017)

The foundational paper that started the transformer revolution—essential for understanding modern LLMs.

BERT: Pre-training of Deep Bidirectional Transformers Devlin et al. (2018)

Changed how we think about language representation—bidirectional context was a game changer for NLP.

Big Bird: Transformers for Longer Sequences Zaheer et al. (2020)

Solves the quadratic attention problem—crucial for processing long documents.

The Geometry of BERT (2025)

Fresh perspective on how BERT represents language geometrically.

Graph Neural Networks 5 papers

Graph Neural Networks: A Review of Methods and Applications Zhou et al. (2018)

Comprehensive survey that ties together GCNs, GATs, and GraphSAGE—essential for relational data.

Semi-Supervised Classification with Graph Convolutional Networks Kipf & Welling (2016)

The paper that made GNNs practical.

Graph Attention Networks Veličković et al. (2017)

Attention mechanism adapted for graphs.

Inductive Representation Learning on Large Graphs (GraphSAGE) Hamilton et al. (2017)

Enables generalization to unseen nodes.

Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Morris et al. (2018)

Connects GNNs to graph isomorphism testing.

LLM Optimization & Efficiency 5 papers

LoRA: Low-Rank Adaptation of Large Language Models Hu et al. (2021)

Train 10,000x fewer parameters while matching full fine-tuning.

QLoRA: Efficient Finetuning of Quantized LLMs Dettmers et al. (2023)

Fine-tune 65B models on single GPU.

FlashAttention: Fast and Memory-Efficient Exact Attention Dao et al. (2022)

IO-aware algorithm that reduces HBM reads—2-4x speedup.

FlashAttention-3: Fast and Accurate Attention with Asynchrony Shah et al. (2024)

75% utilization on H100 GPUs with FP8.

Learning Rate Matters: Vanilla LoRA May Suffice for LLM Fine-tuning Lee et al. (2026)

Challenges the LoRA variant hype.

RAG & Knowledge Retrieval 3 papers

Retrieval-Augmented Generation for Large Language Models: A Survey Gao et al. (2023)

Comprehensive overview of RAG architectures.

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG Singh et al. (2025)

Autonomous agents meet retrieval.

Graph-Anchored Knowledge Indexing for RAG Liu et al. (2026)

Uses graph structures as dynamic knowledge indices.