Infrastructure
Broadcast Joins in Distributed Compute
Broadcast hash join vs sort-merge join in Spark, Snowflake, and BigQuery. When the dimension table is small enough to ship to every executor, broadcast wins; when it isn't, the driver OOMs.
What It Is
A broadcast join ships one side of a join — usually a small dimension table — to every executor in the cluster, where it is held in memory as a hash table. The large fact table is then scanned once and probed against the local copy. There is no shuffle of the large side. Compare this to a sort-merge join, which partitions both sides on the join key, sorts each partition, and merges. Sort-merge is the safe default when both inputs are large; broadcast is much faster when one side fits in memory but catastrophic when it does not.
The mechanic is the same across engines. Spark calls it broadcast hash join and gates it on spark.sql.autoBroadcastJoinThreshold (default 10 MB; commonly tuned to 8-100 MB). Snowflake's optimizer chooses broadcast vs hash-redistribute automatically based on cardinality estimates. BigQuery does the same and labels it broadcast join in the query plan. Hive originated the pattern under the name mapside join or map join, where the small table was loaded into a distributed cache and joined inside each mapper without a reduce step.
Broadcast hash join cost scales as where is the large-side row count and is the broadcast payload size. Sort-merge scales as in compute and pays a full shuffle. The broadcast wins decisively when is small enough that network and memory cost stay below the shuffle cost saved.
When You'd Use It
Use a broadcast join when joining a fact table against a dimension table, lookup table, or filter set that fits in driver and executor memory. Star-schema workloads on a data warehouse are the canonical case: orders joined to product, customer, or date dimensions. ML feature pipelines that look up entity attributes from a small registry table are the same pattern. If the small side comes from an aggregation that the optimizer already knows is bounded (a LIMIT, a GROUP BY that produces few rows, a constant table), broadcast is the right plan.
Avoid broadcasting when the small side is "small relative to the cluster" but still hundreds of megabytes, when the cluster has many executors so total broadcast network traffic is high, or when the optimizer's row-count estimate is wrong because statistics are stale. In Spark, BROADCAST(small_df) and MAPJOIN(small_table) hints force the choice when the optimizer guesses wrong.
Notable Gotchas
Broadcasting a too-large table OOMs the driver, not the executor
The driver collects the small side before broadcasting it. A table that exceeds driver heap (e.g. 50 GB DataFrame on a 4 GB driver) crashes the entire job during collect, not at runtime on a worker. Stale Hive metastore statistics or a forgotten filter pushdown often produce this. Set spark.sql.autoBroadcastJoinThreshold = -1 to disable broadcast globally as a kill switch.
Skewed join keys defeat broadcast and sort-merge alike
A single hot key (one customer with 90 percent of orders) overloads one executor regardless of join strategy. Spark 3 added Adaptive Query Execution (AQE) skew handling that splits hot partitions; Snowflake spills to disk; BigQuery dynamically repartitions. Salting the key (append a random suffix) is the manual workaround when AQE does not help.
References
Related Topics
Last reviewed: April 18, 2026