Attention Mechanism Theory

8 selected11 availableDifficulty 4-611 unseenView topic

IntermediateNew

0 answered

Showing 8 of 11 available questions using your saved history. Retakes draw unseen questions first, then review or retry items, then repeated items only when the pool is small.

8 intermediateAdapts to your performance

Question 1 of 8

120sintermediate (4/10)conceptual

In the transformer self-attention mechanism, why are the attention scores divided by the square root of the key dimension before applying softmax?