What Each Framework Does
Frequentist and Bayesian inference both aim to learn about unknown parameters from observed data . They differ in what they consider random, what constitutes a valid answer, and how prior information enters the analysis.
Frequentist: The parameter is a fixed (unknown) constant. Data is random. Inference is about the long-run behavior of estimators and test procedures across hypothetical repeated experiments.
Bayesian: The parameter is a random variable with a prior distribution encoding beliefs before seeing data. After observing data, the prior is updated to a posterior via Bayes' theorem. Inference is about the posterior distribution.
Side-by-Side Core Formulas
Maximum Likelihood Estimation (Frequentist)
The frequentist workhorse is maximum likelihood. Given data and a model , the MLE is:
The MLE treats as a fixed parameter to be estimated. Its quality is measured by properties like consistency, efficiency, and bias, defined over hypothetical repeated sampling.
Bayesian Posterior (MAP and Full Posterior)
The Bayesian approach applies Bayes' theorem:
The MAP estimate (Maximum A Posteriori) is the mode of the posterior:
But Bayesians typically report the full posterior , not just a point estimate. The posterior encodes all information about given the data and the prior.
Where Each Is Stronger
Frequentist wins on objectivity and large-sample theory
Frequentist methods require no prior specification. The MLE depends only on the likelihood, making the analysis independent of subjective beliefs. This is valuable when:
- You want results that are reproducible and not dependent on analyst choice
- The sample size is large enough that the data dominate any prior
- You need formal guarantees (e.g., confidence interval coverage) that hold regardless of the true parameter value
The asymptotic theory of MLEs is powerful: under regularity conditions, the MLE is consistent, asymptotically normal, and achieves the Cramér-Rao lower bound.
Bayesian wins on coherence and small-sample inference
Bayesian inference provides a complete probability distribution over parameters, enabling natural answers to questions like "what is the probability that ?" Frequentist inference cannot answer such questions because is not a random variable in that framework.
Bayesian methods are particularly strong when:
- The sample size is small and prior information is genuinely available
- The model is complex (hierarchical models, latent variables)
- You need to quantify uncertainty in a decision-theoretically coherent way
- You want to combine information from multiple studies naturally
Key Concepts That Differ
| Frequentist | Bayesian | |
|---|---|---|
| Parameter | Fixed unknown constant | Random variable with a prior |
| Data | Random (from repeated sampling) | Fixed (once observed) |
| Point estimate | MLE | MAP or posterior mean |
| Interval estimate | Confidence interval | Credible interval |
| Interpretation | Long-run frequency properties | Degree of belief |
| Prior | Not used | Required (explicitly specified) |
| Nuisance parameters | Profiled or plugged in | Marginalized (integrated out) |
MLE vs. MAP: The Connection
MAP Reduces to Regularized MLE
Statement
If the prior is for some penalty function , then:
This is exactly the regularized MLE. Gaussian prior gives regularization (ridge). Laplace prior gives regularization (lasso).
Intuition
The MAP estimate bridges the two frameworks. It adds a prior-derived penalty to the log-likelihood. As , the likelihood dominates the prior, and MAP converges to MLE. With finite data, the prior acts as regularization, pulling the estimate toward regions of high prior density.
Confidence Intervals vs. Credible Intervals
The most commonly confused distinction:
A 95% confidence interval is a random interval (because it depends on the data) such that across repeated experiments, . It does not mean there is a 95% probability that is in the specific interval you computed.
A 95% credible interval satisfies . Given the prior and the data, there is a 95% posterior probability that . This is the interpretation most people incorrectly assign to confidence intervals.
In many common settings (regular models, large samples, diffuse priors), the two intervals are numerically similar. They diverge in small samples, with strong priors, or in models with boundary effects.
When a Researcher Would Use Each
Clinical trial with regulatory requirements
Use frequentist methods. Regulatory agencies (FDA, EMA) require pre-specified hypothesis tests with controlled Type I error rates. Confidence intervals with guaranteed coverage properties are the standard. The objectivity of frequentist methods is a feature in this context.
Hierarchical model for small-area estimation
Use Bayesian methods. When you have data from many related groups (e.g., disease rates across counties), hierarchical Bayesian models naturally share information across groups through the prior. Partial pooling gives better estimates for small-sample groups than either complete pooling or no pooling.
Large-scale neural network training
Use frequentist (MLE via gradient descent). With millions of parameters and large datasets, the data dominate any reasonable prior. Bayesian inference over all neural network weights is computationally intractable except via crude approximations (variational inference, MC dropout). Regularization provides the practical benefits of priors without the computational cost of full Bayesian inference.
A/B testing with prior conversion rate data
Either works. Frequentist sequential testing (with corrections for multiple looks) is standard. But Bayesian A/B testing with an informative prior on the baseline conversion rate can reach conclusions faster and provides direct probability statements that stakeholders find more intuitive.
Common Confusions
Bayesian does not mean subjective
The prior can be chosen objectively (reference priors, Jeffreys priors, maximum entropy priors). Many Bayesian analyses use weakly informative priors that have minimal impact on the posterior. The prior is a modeling choice, not necessarily a personal belief.
Frequentist does not mean prior-free
Frequentist methods implicitly depend on modeling choices (the likelihood function, the hypothesis class, the test statistic) that play a role similar to the prior. The prior is just more explicit about where assumptions enter. James-Stein estimation, a frequentist procedure, has a Bayesian interpretation and dominates the MLE in high dimensions.
The posterior is not the sampling distribution
(Bayesian posterior) and viewed as a function of for fixed (sampling distribution) are different objects. Confusing them leads to misinterpreting confidence intervals as credible intervals. They are related by Bayes' theorem, but only after specifying a prior.