NLP for Economic Text Analysis

Sneiderman, Robby

Applied ML

NLP for Economic Text Analysis

Text-as-data methods used in economics and finance: dictionary scoring of central-bank statements, topic models on Fed minutes, and transformer embeddings for financial sentiment, with the measurement-validity caveats that determine whether the proxy is interpretable.

AdvancedTier 3CurrentReference~15 min

Prerequisites

Natural Language Processing Foundations Word Embeddings

Prereq Map

Learning position

Read this page in the graph.

applied-ml | layer 4 | tier 3. This page has 2 direct prerequisites and 2 published dependents.

Open Atlas Prerequisites Leads to

What next

BERT and the Pretrain-Finetune Paradigm

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Central-bank statements, earnings calls, 10-K filings, and policy minutes are high-stakes text. A small change in wording in an FOMC statement moves trillions of dollars in fixed-income markets within minutes. Quantifying the content of that text is now a standard step in macro and finance research, and most of the early methods were built before transformers existed. The contemporary stack mixes 1940s-style dictionary scoring with 2020s-style fine-tuned transformers, often in the same paper, because each method buys different things.

The hard part is rarely the model. It is the measurement-validity argument that connects "score produced by model X on document Y" to "the economic quantity I care about." Without that argument, an impressive R squared on a sentiment-vs-returns regression can be measuring document length, sector composition, or boilerplate copy.

Core Ideas

Dictionary methods. A dictionary assigns each word in a curated list a sign or weight, and the document score is a linear function of word counts. The Loughran-McDonald financial dictionary (Loughran and McDonald 2011, Journal of Finance 66) replaced the Harvard-IV General Inquirer for finance text after showing that "liability," "tax," and "cost" carry different sentiment in 10-Ks than in general English. Hansen, McMahon, and Prat (2018, QJE 133) score FOMC statements with policy-domain dictionaries to recover hawkish-dovish positions. The strengths: interpretable, deterministic, auditable. The weakness: insensitive to negation, syntax, and context.

Topic models on policy text. Latent Dirichlet Allocation on Fed minutes recovers latent themes (financial conditions, labor markets, foreign sector) and tracks their prevalence over time. Hansen and McMahon (2016, Journal of International Economics 99) use LDA to separate economic-condition content from forward-guidance content in FOMC communications, finding that forward-guidance shocks have larger effects on real variables than economic-condition shocks. LDA is unsupervised and topic stability across runs requires careful seeding.

Transformer embeddings and FinBERT. FinBERT (Araci 2019, arXiv 1908.10063) is BERT further pre-trained on a financial corpus and fine-tuned for sentiment classification on the Financial PhraseBank. Compared to dictionary methods, FinBERT handles negation, sarcasm, and context, and typically improves classification accuracy by 5 to 15 points on financial sentiment benchmarks. The cost is opacity: a sentiment score from a transformer cannot be audited word by word the way a dictionary score can.

Measurement validity is the bottleneck. A sentiment score is only useful to an economist if it maps onto an interpretable construct. Two threats recur. First, document length and boilerplate dominate raw scores; standard practice is to control for these explicitly. Second, the proxy may correlate with the outcome of interest through a confounding channel: a "hawkish" score on FOMC text may be capturing macroeconomic conditions already known to the market rather than the surprise component of the statement. The credibility of any text-as-data result rests on the identification argument that ties the score to the construct.

Common Confusions

Watch Out

Higher transformer accuracy is not better economics

FinBERT can beat Loughran-McDonald on sentiment classification while being worse for an economic application. If the construct of interest is "tone the central bank intended to project," interpretability and stability across training runs may matter more than two extra points of F1. Pick the model that supports the inference, not the leaderboard.

Watch Out

Dictionary scores need length and boilerplate controls

A 10-K with twice as many words as the prior year will mechanically have more positive and negative words. Comparing raw counts across documents or across years without normalization measures filing length, not sentiment. The standard fix is per-word frequencies plus year and firm fixed effects.

References

Loughran and McDonald, When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks (Journal of Finance 66, 2011).
Hansen, McMahon, Prat, Transparency and Deliberation Within the FOMC: A Computational Linguistics Approach (Quarterly Journal of Economics 133, 2018).
Hansen and McMahon, Shocking Language: Understanding the macroeconomic effects of central bank communication (Journal of International Economics 99, 2016).
Araci, FinBERT: Financial Sentiment Analysis with Pre-trained Language Models (arXiv 1908.10063, 2019).
Gentzkow, Kelly, Taddy, Text as Data (Journal of Economic Literature 57, 2019). Survey of text-as-data methods in economics.
Bholat, Hansen, Santos, Schonhardt-Bailey, Text Mining for Central Banks (Bank of England Centre for Central Banking Studies Handbook 33, 2015).

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics