Skip to main content
Theorem
Path
Curriculum
Paths
Labs
Diagnostic
Case Study
Blog
Search
Sign in
Quiz Hub
/
DeepSeek Models
DeepSeek Models
5 selected
Difficulty 4-6
5 unseen
View topic
Intermediate
New
0 answered
5 intermediate
Adapts to your performance
Question 1 of 5
120s
intermediate (4/10)
compare
DeepSeek-V3 has 671B total parameters and 37B active per token. Which statement most accurately describes the cost implications?
Hide and think first
A.
Total parameter count is irrelevant for users since only the active subset affects both cost and quality at inference; it's effectively a 37B model in every operational sense.
B.
Inference cost matches a 671B dense model on every axis since all parameters are loaded into GPU memory and queried during the forward pass for each token.
C.
Inference compute per token is comparable to a 37B dense model, but all 671B parameters must be resident in GPU memory — efficiency is in compute, not memory.
D.
Both compute and memory scale with the 37B active parameters, since inactive experts are paged out to host memory and only loaded on routing demand.
Show Hint
Submit Answer
I don't know