Infrastructure
Python for ML Research
Reference card for the Python ML stack in 2025: package managers (uv, pip, conda, poetry), framework distribution, the editor stack, and common pitfalls.
What It Is
Python has been the default language of ML research since roughly 2015, when the combination of NumPy, scikit-learn, and Theano displaced MATLAB for academic deep-learning work. PyTorch (Meta, released 2016) and TensorFlow (Google, 2015) cemented the position. The language is interpreted, dynamically typed, single-threaded by default at the bytecode level (the GIL), and almost universally bound to C, C++, or Rust for any tensor or compute-heavy operation.
The 2025 ML stack is layered: a numerical kernel (NumPy / PyTorch / JAX) wrapped by domain libraries (Hugging Face Transformers, scikit-learn, Lightning) wrapped by experiment scaffolding (Hydra configs, wandb logging, Modal or SLURM job submission). Python serves as glue; the heavy work happens in compiled kernels.
Package management has consolidated around uv (Astral, 2024) as the new default for most greenfield ML projects. uv is a Rust reimplementation of pip and virtualenv that is roughly 10-100x faster than pip for environment creation, with a pyproject.toml-first workflow. Conda still dominates anywhere CUDA / cuDNN system bindings need to be reproducible across platforms (genomics, scientific Python). Poetry has lost ground to uv since 2024. Plain pip + venv remains the lowest-common-denominator option.
When You'd Use It
PyTorch is the dominant research framework: Papers With Code numbers from 2024 put PyTorch at roughly 75-80% of new research-paper implementations, JAX around 10-15%, TensorFlow under 10% and falling. JAX wins for projects that need explicit functional transforms (jit, vmap, pmap, grad) or work at Google / DeepMind / Anthropic; PyTorch wins everywhere else. TensorFlow remains in production at companies that adopted it pre-2020.
The editor stack: VS Code with the Pylance extension is the most common setup. Cursor (a VS Code fork with stronger LLM integration) has gained share since 2024 among solo researchers. Jupyter notebooks remain dominant for exploration, but most production training code now lives in plain .py files invoked by a launcher; long-lived notebooks are an anti-pattern for anything that will be re-run.
For type checking, pyright (Microsoft, used by Pylance) is faster than mypy and has better inference, but mypy still has the larger plugin ecosystem (e.g. mypy --strict is a common CI gate).
Notable Gotchas
Mutable default arguments
def f(x, cache=[]): shares the same list across every call to f. The default object is created once at function definition. Use cache=None and initialize inside the body. This bites every Python programmer at least once.
GIL, threading, and multiprocessing
The Global Interpreter Lock means Python threads cannot execute pure-Python bytecode in parallel. Threads still help for I/O (network, disk, subprocess waits) because the GIL is released during blocking syscalls. For CPU-bound parallelism use multiprocessing or run inside a C extension that releases the GIL (NumPy, PyTorch DataLoader workers). Python 3.13 introduced an experimental no-GIL build but production ML is still on the GIL-default interpreter as of 2026.
References
Related Topics
Last reviewed: April 18, 2026