NVIDIA H100 Cloud Rental Pricing Comparison 2025: The Ultimate Guide to On-Demand, Reserved, and Spot Instances Across Major Providers

The H100 (Hopper architecture) continues to be the gold standard GPU for AI/ML training and inference in 2025 — and for good reasons. Its 80 GB HBM3 memory, 4th-gen Tensor Cores, and NVLink/InfiniBand connectivity make it ideal for large-scale model pre-training, fine-tuning, and high-throughput inference. While newer GPUs based on the Blackwell architecture (e.g., B200, GB200) are beginning to appear, hardware fleet upgrades take time. As a result, H100 remains dominant in cloud offerings. Many organizations still base their tooling, networking, and workflows around H100 — and the availability and price reductions in 2025 reinforce that inertia.

Executive Summary

As of December 2025, hourly on-demand pricing for NVIDIA H100 (80 GB) varies widely across cloud providers — roughly $1.80 to $7.50 per GPU-hour depending on provider, region, and commitment type. On the lower end, marketplace and specialized GPU-cloud platforms such as Vast.ai and RunPod offer H100 for as little as ~$1.87–$1.99/hr per GPU. On the higher end, traditional hyperscalers such as Amazon Web Services (AWS) still list as high as ~$7.5–8.0/hr per GPU for certain regions, although mid-2025 aggressive price cuts have pushed many instances into the ~$3.9–4.2/hr range.

For reserved or committed-use workloads, providers such as Lambda Labs report per-GPU rates dropping to ~$1.85–2.30/hr under long-term commitments.

Cheapest on-demand: Vast.ai, RunPod, and other marketplace/specialist clouds.
Best long-term value (reserved/committed): Lambda Labs, possibly other specialist providers or negotiated reserved deals at hyperscalers.
Spot / preemptible lowest-cost: Marketplaces and specialist clouds — though spot reliability varies widely and is often undocumented.

Key 2025 trends include aggressive price cuts by hyperscalers (e.g., AWS), increasing competition from GPU-cloud specialists driving downward pressure, and widening regional price divergences. H100 remains the dominant workhorse GPU, even as newer architectures (e.g., Blackwell-based B200/GB200) begin to enter the market — but availability, compatibility, and enterprise fleet inertia keep H100 in high demand.


Introduction & Market Context

The H100 (Hopper architecture) continues to be the gold standard GPU for AI/ML training and inference in 2025 — and for good reasons. Its 80 GB HBM3 memory, 4th-gen Tensor Cores, and NVLink/InfiniBand connectivity make it ideal for large-scale model pre-training, fine-tuning, and high-throughput inference. While newer GPUs based on the Blackwell architecture (e.g., B200, GB200) are beginning to appear, hardware fleet upgrades take time. As a result, H100 remains dominant in cloud offerings. Many organizations still base their tooling, networking, and workflows around H100 — and the availability and price reductions in 2025 reinforce that inertia.

From a supply/demand perspective: 2025 saw a surge in competition. Hyperscalers like AWS, Google Cloud, and Microsoft Azure aggressively cut H100 rates to compete with specialist GPU-cloud providers and marketplaces. Meanwhile, cloud-native GPU-only providers (Lambda Labs, RunPod, Vast.ai, etc.) scaled up capacity significantly, driving rates down. As a result, the market fragment widened: large enterprises still gravitate to hyperscalers for stability, whereas startups, research labs, and smaller users increasingly adopt marketplaces.

Another important dimension is the distinction between H100 SXM (data-center, with NVLink/InfiniBand) vs PCIe (more generic, less inter-GPU connectivity) — this difference significantly impacts pricing, performance (multi-GPU topology), and suitability for different workloads. Providers often mix both versions, affecting price variance.


Methodology

To compile this guide, we gathered data from a combination of:

  • Official provider pricing pages (when publicly available).

  • Third-party trackers and aggregators that monitor GPU-cloud pricing (blog analysis, crowd-sourced platforms, and marketplaces).

  • Marketplace sites where pricing fluctuates (specialist GPU-cloud vendors such as Lambda Labs, RunPod, Vast.ai, etc.).

We categorized prices into three classes:

  • On-demand (no commitment) — pay-as-you-go hourly/second billing.

  • Reserved / Committed-Use — discounted hourly rates under long-term commitments (e.g., 1–3 years).

  • Spot / Preemptible — highly discounted but interruptible / variable availability workloads (mainly marketplaces).

Regions considered include: US East (N. Virginia / Ohio), US West (California / Oregon), Asia-Pacific (e.g., Tokyo), and global marketplace coverage (where applicable — marketplace rates often vary by exact host/region). Where possible, we normalize multi-GPU instances (e.g., 8×H100) to per-GPU equivalent pricing.

We note that cloud GPU pricing remains dynamic — some providers apply sustained-use discounts, sales, or spot-market fluctuations. The rates below are accurate as of late November / December 2025, based on publicly available data.


Detailed Pricing Comparison Tables

Table 1: On-Demand 8×H100 Instances (per hour and effective per-GPU)

Provider Instance / SKU Price (total per hour) Effective per-GPU ($/GPU-hr)
Amazon Web Services (AWS) p5.48xlarge (8× H100) ~$55.04/hr (in many regions) ~$6.88
Amazon Web Services (AWS) — post-cut (some regions) p5.48xlarge ~$31.46/hr ~$3.93
CoreWeave 8× H100 HGX (InfiniBand) $49.24/hr ~$6.16
Oracle Cloud BM.GPU.H100.8 (bare-metal 8×H100) ~$80.00/hr (as per third-party reports) ~$10.00
Lambda Labs 8× H100 SXM cluster $23.92/hr (8× GPU × $2.99) ~$2.99

Notes:

  • AWS pricing varies significantly by region. The lower $31.46/hr per-instance figure appears in certain regions per capacity-block pricing sheets.

  • CoreWeave’s 8-GPU node includes high-speed networking (InfiniBand), potentially justifying a higher price than more basic clusters.

Table 2: Single-GPU H100 Pricing (where available)

Provider Instance / SKU Price per GPU-hr (on-demand)
Google Cloud a3-highgpu-1g (1× H100 80 GB) ~$11.06/hr
Microsoft Azure NC_H100_v5 (1× H100 80 GB VM) ~$6.98/hr
Lambda Labs 1× H100 (SXM or PCIe) — when offered $2.49–$3.29/hr depending on config

Note: Many providers only expose multi-GPU H100 SKUs, so normalization to per-GPU equivalents is necessary to approximate single-GPU cost.

Table 3: Spot / Preemptible 8×H100 (or per-GPU) Pricing + (where known) Interruption Notes

Provider / Marketplace Typical Spot / Marketplace Rate (per GPU-hr) Notes / Interruption Risk
RunPod (community / secure cloud) ~$1.99/hr (PCIe H100, community) / ~$2.39/hr (secure) Variable availability; billing per second — good for bursty workloads. RunPod notes “no hourly minimum.”
Vast.ai (various hosts) ~$1.87/hr (marketplace low-price) Fully marketplace: availability and preemption risk vary; may require custom images.
Specialist GPU-cloud / “Neo-cloud” providers $1.80–$2.25/hr (where publicly reported) Users should expect potential interruptions; spot-like behavior depends on provider.

Unfortunately — publicly available data rarely includes documented “average interruption rate” for spot/preemptible H100 — so no reliable table entry for that metric could be compiled.

Table 4: 1-Year / 3-Year Reserved / Committed-Use H100 Pricing (effective hourly) — where disclosed

Provider Commitment Term Effective per-GPU-hr
Lambda Labs 1 year+ ~$1.85/hr (lowest published)
Lambda Labs 6–12 months ~$2.19–$2.29/hr (depending on tenure / bulk)

Note: Public committed-use pricing for most hyperscalers (AWS, GCP, Azure) is not transparently published for H100 as of December 2025 — such reserved-instance or savings-plan rates typically require custom quoting; therefore, they are not included above.


Price Analysis & Insights

Cheapest On-Demand & Single GPU Providers (2025)

  • Marketplaces & specialist GPU-clouds (Vast.ai, RunPod, Lambda Labs) clearly lead on cost efficiency: with H100 as cheap as ~$1.87–$2.99/hr per GPU, they are often 3–4× cheaper than hyperscaler on-demand prices, especially in regions where AWS/GCP remain high.

  • Lambda Labs stands out for managed clusters — at ~$2.99/GPU-hr for 8× H100 SXM clusters, its per-GPU cost is significantly below AWS, CoreWeave, and Oracle Cloud.

  • For users needing single-GPU instances (e.g., small experiments, inference), Azure (~$6.98/hr) and Google Cloud (~$11.06/hr) remain expensive relative to specialist clouds or normalized multi-GPU rates.

Best Value for Long-Term / High-Utilization Workloads

  • For committed workloads — e.g., sustained model training over weeks or months — reserved pricing from specialist providers (like Lambda Labs) offers the lowest cost per GPU-hour (~$1.85–2.30/hr).

  • Considering typical server utilization (e.g., 60–70%), reserved cloud rental often beats the cost of purchasing and maintaining on-prem H100 hardware — especially when factoring capital costs, facility overheads, and maintenance.

Spot / Preemptible Arbitrage & Reliability Trade-offs

  • Spot or marketplace rates (Vast.ai, RunPod, etc.) present significant arbitrage potential: up to ~70–80% discount versus hyperscaler on-demand, ideal for workloads that can tolerate interruptions (e.g., training with checkpointing, batch inference, data processing).

  • However, the downside is unreliable availability — with no public interruption-rate metrics for most providers, users should treat spot clusters as opportunistic, not reliable for production inference or latency-sensitive workloads.

Regional Price Differences & Hyperscaler Behavior

  • Hyperscaler pricing remains region-dependent. For example, AWS’s capacity-block pricing lists p5.48xlarge at ~$31.46/hr in many regions, but in certain regions the price is higher (~$55/hr).

  • The mid-2025 aggressive price cuts from hyperscalers (e.g., AWS) were likely motivated by increased competition from specialist GPU-clouds.

  • Because of these trends, some regions (especially outside US) may still see higher rates due to demand and capacity constraints; in contrast, marketplace providers with global distribution may offer more uniform low-cost pricing.

Impact of Blackwell Launch on H100 Pricing

  • Although Blackwell-based GPUs (e.g., B200, GB200) are entering the market, H100 demand remains high. Due to the continued demand and fleet inertia (software, tooling, compatibility), H100 pricing saw downward pressure — but not collapse. As of late 2025, many providers maintain H100 pricing in the $2–4/hr per GPU range on-demand, indicating H100 remains cost-competitive and widely usable.

  • Specialist providers — already heavily invested in H100 — are unlikely to retire H100 fleets soon, which should keep supply adequate and prices stable or trending downward.


Performance & Feature Comparison (Beyond Just Price)

When selecting H100 instances, price is only part of the story. For many AI engineers and enterprises, the performance architecture (topology, networking, storage, MIG support) matters equally — especially for distributed training, multi-node scaling, or inference serving. Below are key dimensions beyond price.

  • NVLink topology (inter-GPU connectivity):

    • Providers offering H100 SXM in HGX or DGX-like nodes (e.g., CoreWeave, Lambda Labs) often enable full NVLink / NVSwitch connectivity among GPUs in a node — ideal for large multi-GPU training, gradient-sync efficiency, and memory pooling. CoreWeave’s 8× H100 HGX nodes are explicitly advertised with high bandwidth.

    • Marketplace / PCIe-based offerings (e.g., RunPod, Vast.ai) may lack NVLink or offer only default PCIe interconnect — sufficient for independent GPU jobs or loosely-coupled tasks (e.g., many small training jobs, inference), but suboptimal for large-scale distributed training requiring high inter-GPU bandwidth.

  • Networking (InfiniBand vs Ethernet):

    • High-end HPC / cloud-native providers like CoreWeave typically pair H100 HGX with InfiniBand or RoCE networking, enabling multi-node distributed training with low-latency, high-bandwidth communication.

    • Marketplace hosts often use standard Ethernet — acceptable for single-node use or small clusters, but network bandwidth may become a bottleneck for synchronized multi-node training or large data transfers.

  • Storage performance:

    • Premium cloud providers often combine H100 nodes with fast local NVMe or SSD storage and large RAM / vCPU allocations, ideal for data-intensive workloads (datasets, caching, scratch space). E.g., Lambda Labs lists 22 TiB SSD storage along with high vCPU and RAM allocations for H100 clusters.

    • Marketplace instances may have more modest storage and I/O performance; users must verify storage tiers, egress rates, and persistent volume support.

  • Multi-Instance GPU (MIG) support and pricing implications:

    • H100 (SXM) supports MIG (or similar partitioning) in some configurations — useful when you want to share one GPU among multiple smaller tasks (e.g., inference shards, parallel smaller models). However, not all providers expose MIG in their cloud offerings; even when they do, pricing might not scale linearly (e.g., overhead per instance). As of 2025, public documentation of MIG-enabled H100 cloud instances remains sparse.

    • For PCIe-based H100 instances on marketplaces, MIG might be unsupported — making them less suitable for fine-grained multi-tenant workloads.

In short: if you need high-performance multi-GPU training, distributed training at scale, or heavy GPU interconnect — prefer providers offering H100 SXM + NVLink + InfiniBand. If you need low-cost, flexible GPU usage (e.g., bursty jobs, inference, small experiments) — marketplace/PCIe instances may suffice.


Hidden Costs & Gotchas

While hourly GPU-hour pricing is the headline, several oft-overlooked factors can materially affect total cost or suitability.

  • Data transfer / egress fees: Hyperscalers (AWS, GCP, Azure, Oracle) typically charge for data egress out of the cloud region — this can significantly increase total cost if you move large datasets (training data, checkpoints, model output). Marketplace providers may or may not include egress fees; many only advertise GPU-hour price without clarifying data transfer costs.

  • Premium support and enterprise SLAs: Hyperscalers offer enterprise-grade SLAs, support, and guarantees — but often at a premium. This is rarely reflected in standard on-demand GPU-hour rates. For mission-critical workloads, the incremental cost may be non-trivial.

  • Minimum billing increments / idle time waste: Some providers (especially hyperscalers) may bill in hourly increments even if actual usage is lower, or have minimum times after start-up. This can lead to wasted spend if GPUs sit idle. Marketplace providers often bill per second — which is more efficient for short jobs. E.g., RunPod advertises per-second billing and “no hourly minimums.”

  • Queue times and availability: On marketplaces, cheap spot or community GPUs may be available only intermittently. During periods of high demand, queue times may increase, or availability may drop — which can delay jobs. For production workloads (e.g., inference serving), this unpredictability may be unacceptable.

  • Hidden overhead (CPU, RAM, storage, disk I/O): Low-cost offerings may skimp on CPU, RAM, or storage I/O capabilities, which can bottleneck GPU workloads — especially for data loading, pre-processing, or high-throughput inference.

  • Licensing and long-term discounts complexity: While reserved or committed-use pricing offers big savings, obtaining these rates often requires negotiations, minimum GPU counts, or enterprise commitments — which may not be feasible for small teams or short-lived projects.


Recommendations by Workload Type

Here’s a rough guideline based on typical AI workloads, to help you choose the right provider / pricing model in 2025:

  • Short experiments / prototyping (hours → days):

    • Use marketplace or specialist clouds (Vast.ai, RunPod) for minimal cost and maximal flexibility. The low per-GPU-hour cost and per-second billing make them ideal.

    • If you need stable performance and some GPU interconnect but small scale — consider Lambda Labs single-GPU or small-cluster offerings.

  • Training large models (weeks → months):

    • For large-scale or multi-GPU distributed training, use SXM-based H100 clusters with NVLink + InfiniBand (e.g., CoreWeave, Lambda Labs). The network and inter-GPU connectivity will pay off in better training efficiency.

    • For sustained usage, negotiate reserved / committed-use pricing — this often yields the lowest per-GPU-hour cost (e.g., ~$1.85–2.30/hr).

  • Inference / serving (real-time or batch):

    • If inference workloads are bursty or unpredictable, marketplace or PCIe instances might be sufficient.

    • For latency-sensitive or enterprise-grade production, choose hyperscaler or specialist offerings with high network reliability, dedicated resources, and enterprise SLAs.

  • Academic / research use (flexible, cost-sensitive):

    • Marketplace providers like Vast.ai or RunPod offer the best cost-effective entry point.

    • For larger research groups needing multi-GPU training, specialist GPU-cloud providers with reserved or discounted plans (e.g., Lambda Labs) offer a good balance of performance and price.


Future Outlook (2026–2027)

Looking ahead to 2026–2027, several dynamics are likely to shape H100 cloud rental pricing and usage:

  • As Blackwell-based GPUs (e.g., B200, GB200) ramp up in supply and enter more cloud provider offerings, we may start seeing a price divergence: H100 (older but abundant) trending downward, Blackwell priced at a modest premium initially.

  • Providers may increasingly offer mixed fleets (H100 + B200/GB200), allowing users to choose based on workload needs (e.g., H100 for existing pipelines, Blackwell for newer large models).

  • For H100 PCIe vs SXM, we may see clearer segmentation: PCIe instances (lower cost, less interconnect) for small jobs & inference; SXM for heavy training workloads — leading to differentiated pricing, like “H100-standard” and “H100-HPC.”

  • Demand for spot/marketplace GPU capacity is likely to grow — especially among startups, academic labs, and open-source communities — which may put further downward pressure on on-demand prices.

  • Potential emergence of new entrants (GPU-cloud startups, decentralized GPU marketplaces), increasing supply and competition, further benefiting users via lower cost or more flexibility.

Nevertheless, despite new GPU generations, H100 is likely to remain relevant for many years — given inertia in software stacks, widespread deployment, and the fact that for many workloads, H100 remains “good enough.”


Conclusion

As of December 2025, the cloud-rental landscape for NVIDIA H100 is more competitive than ever. With marketplace providers offering GPUs for under $2/hr, and specialist GPU-cloud providers delivering robust clusters for ~$3/hr per GPU, the cost barrier for AI/ML training and inference has dropped significantly. Hyperscalers remain relevant for large organizations needing stability, enterprise SLAs, and global footprint — but they are no longer the only, nor always the cheapest, option.

For AI engineers, startup CTOs, researchers or enterprise decision-makers: the optimal provider depends heavily on workload profile. Short, bursty jobs — go for marketplace/specialist clouds; long-term, heavy training — negotiate reserved deals; production inference — weigh reliability and networking features; multi-GPU distributed training — favor SXM + NVLink + InfiniBand.

As we move into 2026–2027, expect H100 pricing to continue trending downward, especially with the arrival of newer architectures like Blackwell. However, H100’s entrenched ecosystem means it will remain a mainstay for AI infrastructure for years to come.

 

Disclaimer: Cloud GPU prices are highly dynamic and depend on region, availability, demand, and provider-specific factors. The rates cited above are accurate as of December 2025 using publicly available data — readers should verify current pricing before making procurement decisions.