Vision Transformer Lineage: ViT, DeiT, Swin, MAE, DINOv2, SAM

4 selectedDifficulty 5-64 unseenView topic

Saved practice

Keep this quiz in your learner record

Answers count toward your profile, review queue, and next-topic suggestions. You can also use the quick practice below.

IntermediateNew

0 answered

4 intermediateAdapts to your performance

Question 1 of 4

120sintermediate (5/10)compute

For a Vision Transformer with N = H W / P^{2} patches and embedding dimension d, what is the per-layer self-attention cost, and how does it scale with image side length at fixed patch size P ?