Computer Vision for Intelligence Imagery

Sneiderman, Robby

ML Applications

Computer Vision for Intelligence Imagery

Object detection, segmentation, change detection, and geospatial foundation models on overhead and full-motion video imagery. Sensor heterogeneity and domain shift are the binding constraints, not architecture choice.

AdvancedTier 3CurrentReference~15 min

Prerequisites

Hough Transform and Circle Detection

Prereq Map

Why This Matters

Overhead imagery is the largest source of openly published machine-learning labels in the geospatial domain, and it carries properties that differ sharply from natural images. Targets are small relative to the frame (often 5-50 pixels on a side), object orientation is uniform on the ground plane, the visible spectrum is one of many sensing channels, and acquisition angle, sun angle, and atmospheric haze vary scene to scene. A detector that scores 0.9 mAP on COCO will struggle on these inputs without specific architectural and training choices.

Full-motion video (FMV) and ground imagery add temporal context but shift the problem to tracking, re-identification, and event recognition under heavy occlusion and motion blur. The intelligence-imagery stack in the open literature is built around a handful of public datasets and a growing set of geospatial foundation models, with active work on domain adaptation across sensors and acquisition geometries.

Core Methods

Object detection on overhead imagery. xView (Lam et al. 2018, arXiv 1802.07856) released about a million annotated objects across 60 classes in roughly one-meter-resolution WorldView-3 imagery. The dataset exposed the small-object and class-imbalance problems that drive design choices: high-resolution backbones, anchor scales tuned to small targets, and class-balanced sampling. DOTA (Xia et al. 2018) added rotated bounding boxes for objects whose orientation matters (ships, vehicles).

Segmentation on satellite. The SpaceNet challenges (Van Etten et al. 2018, arXiv 1807.01232) released building footprints, road networks, and later flood and off-nadir imagery across multiple cities, providing a benchmark that rewards models robust to acquisition angle and city morphology. U-Net variants and HRNet remain strong baselines; the binding factor is again label noise and inter-city generalization.

Change detection. Bitemporal change detection compares aligned image pairs to flag pixels that have changed semantically (new construction, land-use shift, infrastructure damage) while suppressing nuisance variation from illumination, season, and viewing geometry. Methods range from siamese segmentation networks to differencing in foundation-model embedding space. The xBD dataset (Gupta et al. 2019) specifically targets post-disaster damage assessment.

Geospatial foundation models. Prithvi (Jakubik et al. 2023, arXiv 2310.18660), released by NASA and IBM, is a temporal-vision transformer pre-trained on Harmonized Landsat-Sentinel data, with downstream fine-tuning shown for flood mapping, fire-scar detection, and crop classification. SatMAE (Cong et al. 2022, arXiv 2207.08051, NeurIPS 2022) is a masked autoencoder pre-trained on multi-spectral and temporal satellite stacks. Both target the regime where labels are scarce but unlabeled imagery is abundant, and both reduce the per-task labeling burden by fine-tuning instead of training from scratch.

Watch Out

Domain shift between sensors is not solved by more data

A model trained on WorldView-3 panchromatic imagery does not transfer cleanly to Sentinel-2 multispectral imagery, even when both cover the same area on the same day. Spatial resolution, spectral bands, sensor noise, and processing-level artifacts each induce a distribution shift. Geospatial foundation models help by exposing a shared embedding, but fine-tuning data from the target sensor is still required for production accuracy.

Watch Out

mAP on overhead imagery is not mAP on COCO

The same number means different things. Overhead detectors are evaluated at small object scales and with class distributions dominated by a few common classes (cars, buildings) and a long tail of rare ones. A reported mAP needs a per-class breakdown and a per-scale breakdown to support any deployment claim.

References

Lam, Kuzma, McGee, Dooley, Laielli, Klaric, Bulatov, McCord. "xView: Objects in Context in Overhead Imagery." arXiv 1802.07856 (2018). Dataset description, label process, and baseline detector results on roughly one-meter-resolution WorldView-3 imagery.
Van Etten, Lindenbaum, Bacastow. "SpaceNet: A Remote Sensing Dataset and Challenge Series." arXiv 1807.01232 (2018). Building, road, and city-scale segmentation challenges across multiple cities and acquisition conditions.
Jakubik, Roy, Phillips, Fraccaro et al. "Foundation Models for Generalist Geospatial Artificial Intelligence." arXiv 2310.18660 (2023). The NASA-IBM Prithvi model card and downstream evaluations on flood, fire scar, and crop tasks.
Cong, Khanna, Meng, Liu, Rozi, He, Burke, Lobell, Ermon. "SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery." NeurIPS 2022. arXiv 2207.08051. Masked-autoencoder pre-training for multi-spectral and temporal satellite stacks.
Xia, Bai, Ding et al. "DOTA: A Large-scale Dataset for Object Detection in Aerial Images." CVPR 2018. Rotated-bounding-box annotations for orientation-sensitive overhead targets.
Gupta, Hosfelt, Sajeev et al. "xBD: A Dataset for Assessing Building Damage from Satellite Imagery." arXiv 1911.09296 (2019). Bitemporal disaster-damage benchmark used for change-detection and damage-classification work.

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

0

No direct prerequisites are declared; this is treated as an entry point.

Derived topics

0

No published topic currently declares this as a prerequisite.