RNNs for Signal Sequences

Sneiderman, Robby

Applied ML

RNNs for Signal Sequences

LSTMs and GRUs applied to non-stationary signal sequences: channel equalization, OFDM symbol detection, RNN-aided Viterbi, and real-time speech enhancement. Why transformers replaced most production RNN signal pipelines after 2020 except under strict latency or memory budgets.

AdvancedTier 3CurrentReference~15 min

Prerequisites

Recurrent Neural Networks Signals and Systems for ML

Prereq Map

Learning position

Read this page in the graph.

applied-ml | layer 4 | tier 3. This page has 2 direct prerequisites and 2 published dependents.

Open Atlas Prerequisites Leads to

What next

Transformer Architecture

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Signal sequences from radio channels, sonar, vibration sensors, and microphones share two properties that make recurrent networks a natural fit: the underlying physics is causal, and the channel state evolves slowly relative to the symbol rate. An LSTM or GRU can carry a hidden state that tracks fading coefficients, multipath delay spread, or speaker-room acoustics across thousands of samples without the quadratic memory of self-attention.

Before 2020, RNN-based equalizers and detectors were the default learned baseline against which classical Viterbi and MMSE receivers were judged. Farsad and Goldsmith showed that a sliding-window bidirectional LSTM detector matches the Viterbi algorithm on inter-symbol-interference channels and beats it when the channel model is mismatched, with no explicit channel estimation step (IEEE Trans. Signal Process. 66(21), 2018, arXiv:1705.08044).

After 2020, transformer encoders trained with self-supervised pretext tasks took over most large-scale speech and channel-modeling work. RNNs survive in production where latency budgets are tens of microseconds (FPGA receivers), where state must be carried indefinitely (online speech enhancement on hearing aids), or where the model has to fit in a few hundred kilobytes of SRAM.

Core Ideas

A symbol detector treats the received complex baseband samples $r_t = h_t \star s_t + n_t$ as the input sequence and emits a posterior over the transmitted symbol $s_t$ . The LSTM hidden state $h_t$ implicitly tracks the channel impulse response without a Kalman update, and a softmax head replaces the trellis search of Viterbi decoding. Bidirectional layers add lookahead at the cost of latency, which is acceptable for OFDM frame decoding but not for streaming voice.

In RNN-aided Viterbi, the network does not replace the decoder; it supplies the per-state branch metric. The trellis structure enforces the channel memory constraint, while the LSTM learns deviations from the assumed Gaussian noise model. This hybrid is harder to beat than either pure approach when the noise has heavy tails or impulsive interference.

For real-time speech enhancement, a frame-online GRU operates on log-mel features at a 10 ms hop. The state carries noise-floor and pitch estimates across frames, so the network does not need to re-estimate them on every frame. RNNoise (Valin, 2018) ships in WebRTC and Discord with under 100 kB of weights and one-frame algorithmic latency, a regime where a transformer with positional encoding is awkward.

OFDM symbol detection is a useful contrast: each OFDM symbol is independent after the cyclic prefix removes ISI, so the temporal model only needs to track slow channel aging across symbols. A small unidirectional LSTM over pilot subcarriers can replace least-squares channel estimation plus zero-forcing equalization with lower MSE when subcarriers are correlated.

Common Confusions

Watch Out

LSTMs do not have unbounded memory

The cell state can in principle persist indefinitely, but in practice gradients vanish over a few hundred steps and the forget gate decays the state. For sequences longer than a few thousand samples, segment the input or use a state-space model.

Watch Out

Replacing the decoder is not the same as augmenting it

A pure RNN classifier discards the algebraic structure of the channel code. RNN-aided Viterbi keeps the trellis and only learns the metric, which preserves the maximum-likelihood guarantee under the assumed model and adds robustness to model mismatch.

References

Farsad and Goldsmith, "Neural Network Detection of Data Sequences in Communication Systems," IEEE Trans. Signal Process. 66(21), 2018, arXiv:1705.08044
Valin, "A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech Enhancement," MMSP 2018, arXiv:1709.08243
Ye, Li, Juang, "Power of Deep Learning for Channel Estimation and Signal Detection in OFDM Systems," IEEE Wireless Commun. Lett. 7(1), 2018
Shlezinger, Farsad, Eldar, Goldsmith, "ViterbiNet: A Deep Learning Based Viterbi Algorithm for Symbol Detection," IEEE Trans. Wireless Commun. 19(5), 2020, arXiv:1905.10750
Hochreiter and Schmidhuber, "Long Short-Term Memory," Neural Computation 9(8), 1997

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics