Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Attention as Kernel Regression

1 questionsDifficulty 6-6View topic
Intermediate
0 / 1
1 intermediateAdapts to your performance
1 / 1
intermediate (6/10)conceptual
Softmax attention can be interpreted as a Nadaraya-Watson kernel regression estimator with kernel . What does the factor in the denominator correspond to in the kernel regression interpretation?