Applied ML
Representation Learning in Cosmology
Self-supervised pretraining on simulations and survey data: contrastive embeddings for galaxy morphology, AstroCLIP-style multimodal models, and cross-survey transfer for photometric redshift and galaxy property estimation.
Prerequisites
Why This Matters
Astronomy has a labels problem and a labels-are-cheap problem at the same time. Spectroscopic redshifts, morphological classifications, and physical parameters (stellar mass, star-formation rate, metallicity) require expensive follow-up or careful modeling. Imaging is plentiful: surveys produce galaxy cutouts where the only "label" is the pixels themselves.
Self-supervised pretraining converts the imaging glut into a backbone that downstream tasks fine-tune with small labeled sets. The pretrained representation captures morphology, color, and structural features that are useful across photo-, classification, and anomaly hunting. The same playbook that took NLP from task-specific RNNs to GPT-class foundation models is now standard in survey astronomy.
The cross-survey transfer angle matters because surveys differ in depth, filter set, PSF, and noise. A representation that generalizes from SDSS to DES to LSST avoids retraining from scratch each time a new instrument comes online. Robust embeddings are also what enables similarity-based anomaly search across -galaxy catalogs.
Core Ideas
Contrastive learning for galaxy morphology. Hayat et al. (2021, ApJL 911; arXiv 2012.13083) applied SimCLR-style contrastive pretraining to SDSS galaxy images. With augmentations covering rotations, flips, and photometric reddening, the learned representation produced photo- estimates matching supervised baselines at of the labeled data, and morphology classifications competitive with the Galaxy Zoo CNN baseline. The dominant modes in the embedding space corresponded to physical axes (color, size, ellipticity) without supervision.
Multimodal foundation models: AstroCLIP and friends. Parker et al. (2024, MNRAS 531; arXiv 2310.03024) trained AstroCLIP on paired galaxy images and spectra from DESI, using a CLIP-style contrastive objective across modalities. The shared embedding supports image-to-spectrum retrieval, spectrum-conditioned image generation, and zero-shot regression on stellar mass and redshift. SpectraGPT and SpectraFM extend the same idea with transformer backbones over tokenized spectra.
Self-supervised pretraining on simulations. Pretraining on outputs of hydrodynamic simulations (IllustrisTNG, EAGLE) gives the network priors that match physical scaling relations before it ever sees real data. Sarmiento et al. (2021, ApJ 921) and follow-up work used this for stellar-population inference. The risk is the simulation-to-observation gap: representations that overfit to simulator artifacts transfer poorly.
Cross-survey transfer. A representation trained on SDSS imaging and fine-tuned with a few hundred labeled DES galaxies can match models trained from scratch on DES labels (Walmsley et al. 2022, MNRAS 509). The practical impact is on rare-class detection (mergers, ring galaxies, strong lenses) where labeled examples are scarce in any single survey. Foundation models trained jointly on multiple surveys are the natural next step and are under active development for Rubin LSST commissioning.
Common Confusions
Linear probe accuracy is not downstream task accuracy
A high linear-probe score on a benchmark like Galaxy Zoo does not guarantee that the representation transfers to a different survey, a different morphology task, or a regression target like redshift. Evaluation should match the deployment setting: same survey, same noise level, same class distribution. Generic linear-probe leaderboards inflate apparent transfer.
References
Related Topics
Last reviewed: April 18, 2026
Prerequisites
Foundations this topic depends on.
- Contrastive LearningLayer 3
- Feedforward Networks and BackpropagationLayer 2
- Differentiation in RnLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Matrix CalculusLayer 1
- The Jacobian MatrixLayer 0A
- The Hessian MatrixLayer 0A
- Matrix Operations and PropertiesLayer 0A
- Eigenvalues and EigenvectorsLayer 0A
- Activation FunctionsLayer 1
- Convex Optimization BasicsLayer 1
- Self-Supervised VisionLayer 4
- Vision Transformer LineageLayer 4
- Transformer ArchitectureLayer 4
- Attention Mechanism TheoryLayer 4
- Softmax and Numerical StabilityLayer 1
- Convolutional Neural NetworksLayer 3
- Vectors, Matrices, and Linear MapsLayer 0A