Attention Is All You Need (Paper)
IntermediateNew
0 answered5 intermediateAdapts to your performance
Question 1 of 5
120sintermediate (5/10)state theorem
Vaswani et al. (2017) introduced the Transformer with scaled dot-product attention. What does the scaling factor correct for, and why is the scaling necessary?