Vision Transformer Lineage: ViT, DeiT, Swin, MAE, DINOv2, SAM
IntermediateNew
0 answered4 intermediateAdapts to your performance
Question 1 of 4
120sintermediate (5/10)compute
For a Vision Transformer with patches and embedding dimension , what is the per-layer self-attention cost, and how does it scale with image side length at fixed patch size ?