Beta. Content is under active construction and has not been peer-reviewed. Report errors on
GitHub
.
Disclaimer
Theorem
Path
Curriculum
Paths
Demos
Diagnostic
Search
Quiz Hub
/
Attention Variants and Efficiency
Attention Variants and Efficiency
3 questions
Difficulty 5-6
View topic
Intermediate
0 / 3
3 intermediate
Adapts to your performance
1 / 3
intermediate (5/10)
compare
Several 'efficient attention' methods aim to reduce standard attention's
O
(
n
2
)
cost. Which preserves EXACT attention while reducing memory I/O?
Hide and think first
A.
FlashAttention: tiles Q, K, V into SRAM and uses online softmax to avoid materializing the
n
×
n
matrix, computing exact attention with reduced HBM I/O
B.
Linformer: approximates attention with a low-rank factorization, giving linear complexity in sequence length
C.
Longformer: uses sliding-window and global attention patterns, making it exact but faster for long documents
D.
Linear Attention: replaces softmax with a kernel feature map, giving exact results at
O
(
n
)
cost
Submit Answer