SkyPilot is a framework for running AI and batch workloads on any infra, offering unified execution, high cost savings, and high GPU availability.

SkyPilot abstracts away infra burdens:

Launch dev clusters, jobs, and serving on any infra
Easy job management: queue, run, and auto-recover many jobs

SkyPilot supports multiple clusters, clouds, and hardware (the Sky):

Bring your reserved GPUs, Kubernetes clusters, or 12+ clouds
Flexible provisioning of GPUs, TPUs, CPUs, with auto-retry

SkyPilot cuts your cloud costs & maximizes GPU availability:

Autostop: automatic cleanup of idle resources
Managed Spot: 3-6x cost savings using spot instances, with preemption auto-recovery
Optimizer: 2x cost savings by auto-picking the cheapest & most available infra

SkyPilot supports your existing GPU, TPU, and CPU workloads, with no code changes.

Install with pip:

To get the latest features and fixes, use the nightly build or install from source:

Current supported infra (Kubernetes; AWS, GCP, Azure, OCI, Lambda Cloud, Fluidstack, RunPod, Cudo, Paperspace, Cloudflare, Samsung, IBM, VMware vSphere):

Getting Started

You can find our documentation here.

SkyPilot in 1 Minute

A SkyPilot task specifies: resource requirements, data to be synced, setup commands, and the task commands.

Once written in this unified interface (YAML or Python API), the task can be launched on any available cloud. This avoids vendor lock-in, and allows easily moving jobs to a different provider.

Paste the following into a file my_task.yaml:

Prepare the workdir by cloning:

Launch with sky launch (note: access to GPU instances is needed for this example):

SkyPilot then performs the heavy-lifting for you, including: