Skip to main content

Infrastructure

Modal: Serverless GPU Platform

Reference for Modal Labs: function decorators, container builds, deployment patterns, GPU types, and how it compares to RunPod, Replicate, Beam, and Lambda Labs.

CoreTier 3Current~12 min
0

What It Is

Modal is a serverless cloud platform for Python, founded by Erik Bernhardsson (formerly of Spotify) in 2021. Instead of provisioning servers or Kubernetes pods, the user writes Python functions decorated with @app.function(...) and Modal builds the container, schedules a worker, and runs the function. The platform spans CPU jobs, GPU inference, batch processing, and HTTPS endpoints behind one SDK.

The execution model is function-first: every unit of work is a Python function. Functions can be called like local functions (f.remote(x)), spawned async (f.spawn(x)), mapped over an iterable (f.map(args)), or exposed as a web endpoint (@modal.web_endpoint). A collection of related functions plus shared resources (volumes, secrets, scheduled triggers) is an app.

Container images are built declaratively in Python: modal.Image.debian_slim().pip_install("torch", "transformers").apt_install("git"). The image spec is hashed, so identical specs reuse cached layers. For more control, Image.from_dockerfile(...) ingests a normal Dockerfile.

The category is "serverless GPU." Direct competitors:

  • RunPod: cheaper raw GPU rental, less Pythonic, exposes Docker containers and a serverless layer; better for users who already have container infrastructure.
  • Replicate: deploy-a-model service built around the Cog packaging format; better for shipping a single model behind an API, weaker for general Python jobs.
  • Beam Cloud: similar function-first model to Modal, smaller team and ecosystem.
  • Lambda Labs: traditional GPU rental (1-click instances or reserved clusters), no serverless layer; best for long-running training, not bursty inference.
  • Hyperbolic, Fly.io GPUs, Cloudflare Workers AI: smaller niches.

When You'd Use It

Use Modal for inference endpoints with bursty traffic (the platform spins workers up and down so you do not pay for idle GPUs), batch jobs that need 100-10000 parallel workers (the .map API plus per-second billing makes this trivial), and quick experiments where setting up RunPod or AWS feels heavyweight.

Anti-patterns: do not run multi-day distributed training jobs on Modal; the per-second billing is convenient but ends up more expensive than a reserved Lambda Labs or CoreWeave cluster for sustained workloads. Do not use Modal for state-heavy services that need a long-lived database connection; the serverless model fits stateless functions best.

Deployment patterns: modal run for one-off invocations, modal deploy to create a persistent app with a stable URL, @app.schedule(modal.Cron("0 9 * * *")) for cron-style triggers. Volumes (modal.Volume) and Network File Systems (modal.NetworkFileSystem) provide persistent storage between invocations.

GPU types as of 2026: T4, L4, A10G, A100 (40 GB and 80 GB), H100, H200, and B200 on a waitlist. Cold start latency for an unloaded container ranges from ~5 seconds (small CPU image) to 30+ seconds (PyTorch + 10 GB model weights). Setting keep_warm=N keeps N containers always-on at full per-second cost; this is the single most common bill surprise.

Notable Gotchas

Watch Out

keep_warm pools bill 24x7

A function with keep_warm=4 on an A100 80 GB runs four GPUs around the clock at full price. Forgetting to remove or downscale a warm pool after a launch can multiply a monthly bill by 10x. Audit keep_warm settings before any deploy and prefer min_containers=0 with longer cold starts for low-traffic apps.

Watch Out

Container builds are not Dockerfiles

Modal's Image API looks similar to Dockerfile syntax but is not Docker. Layer caching, build context, and base-image selection are all handled by Modal's builder. A Dockerfile that works locally may produce a different image when wrapped via Image.from_dockerfile. For maximum portability, write the image as a .py spec, not a Dockerfile.

References

Related Topics

Last reviewed: April 18, 2026

Next Topics