What GPU do I need for Stable Diffusion SDXL?

SDXL needs at least 8 GB VRAM for basic use and 16–24 GB for full quality without memory optimization. The RTX 4090 (24 GB) is the sweet spot for local use; in the cloud, an A40 (48 GB) or L40S (48 GB) lets you batch-generate at full resolution. You can find RTX 4090 cloud rentals for as low as $0.35–0.50/hr on GPUHunt.

How fast is Stable Diffusion SDXL on a cloud GPU?

On an RTX 4090, SDXL generates a 1024×1024 image in about 3–5 seconds (20 steps). On an A100 80GB, it's slightly faster at 2–4 seconds due to tensor core optimizations. Flux.1 Dev takes longer — roughly 15–30 seconds per image on a 4090.

Is it cheaper to run Stable Diffusion locally or in the cloud?

For casual use (a few images/day), local is cheaper if you already own an RTX 3090 or 4090. For batch generation (hundreds of images), cloud is often more economical — you pay only for the time you're actively generating, rather than running a hot GPU 24/7.

Best GPU Cloud for Stable Diffusion & Flux in 2025

Stable Diffusion SDXL and Flux.1 have raised the bar for open-source image generation — but they've also raised the VRAM bar. SDXL at full quality needs 8–16 GB, and Flux.1 Dev pushes you to 24 GB or more. If your local GPU is struggling, cloud GPUs let you generate at full resolution without compromising quality or waiting for quantized workarounds.

What GPU Do You Need for Stable Diffusion?

VRAM is the primary constraint for image generation. SDXL (1024×1024 native resolution) needs at least 8 GB VRAM for basic generation and 12–16 GB for full quality without memory optimization tricks. Flux.1 Dev and Schnell are significantly more demanding — expect 20–24 GB for comfortable full-resolution generation at batch size 1.

💡 The sweet spot for cloud image generation: an RTX 4090 (24 GB) for SDXL and Flux.1 Schnell, or an A40/L40S (48 GB) if you want to batch-generate at full resolution with Flux.1 Dev and ControlNet simultaneously.

GPU Specs for Stable Diffusion: RTX 4090 vs A40 vs L40S

GPU	VRAM	Memory Bandwidth	SDXL Speed (img/min)	Flux.1 Dev Speed	Cloud Price
RTX 4090	24 GB GDDR6X	1.0 TB/s	~12–15 imgs/min	~2–4 imgs/min	$0.35–0.74/hr
A40	48 GB GDDR6	0.7 TB/s	~8–10 imgs/min	~1.5–3 imgs/min	$0.79–1.49/hr
L40S	48 GB GDDR6	0.9 TB/s	~10–13 imgs/min	~2–4 imgs/min	$1.49–2.49/hr
A100 80GB	80 GB HBM2e	2.0 TB/s	~10–14 imgs/min	~3–5 imgs/min	$1.49–2.20/hr
RTX 3090	24 GB GDDR6X	0.9 TB/s	~8–10 imgs/min	~1.5–2.5 imgs/min	$0.20–0.44/hr

The RTX 4090 is the standout for image generation: its GDDR6X bandwidth and Ada Lovelace architecture make it faster than much pricier data center GPUs for single-image generation. The A40 and L40S shine for batch workloads where the extra VRAM lets you run larger batches simultaneously.

Provider Pricing Comparison for Image Generation GPUs

Provider	RTX 4090 Price	A40 Price	L40S Price	Notes
Vast.ai	$0.35–0.65/hr	$0.60–1.10/hr	Rare	Cheapest, peer-to-peer, variable reliability
RunPod	$0.74/hr on-demand	$0.79–0.99/hr	$1.79/hr	Reliable, good SD templates
RunPod (spot)	$0.35–0.55/hr	$0.45–0.79/hr	$0.89–1.49/hr	40–70% cheaper, interruptible
Lambda Labs	Not available	Not available	Not available	Data center GPUs only
Paperspace	$0.45/hr	$0.76/hr	Not available	Gradient notebooks, easy SD setup
JarvisLabs	Not available	$0.89/hr	$1.49/hr	Pre-installed SD/ComfyUI templates

Images Per Dollar: RTX 4090 Wins for Most Use Cases

For most Stable Diffusion users, the RTX 4090 at $0.35–0.50/hr on Vast.ai or RunPod spot delivers the best images-per-dollar. At 12 images/minute (20 steps, SDXL), you generate ~720 images/hour. At $0.50/hr, that's roughly $0.0007 per image — compared to Midjourney's ~$0.02 per image on the Pro plan.

When to Use an A40 or L40S Instead

→Flux.1 Dev with ControlNet or IP-Adapter simultaneously (needs 30–40 GB VRAM)
→Batch generation pipelines generating thousands of images at once
→Running multiple SD models loaded simultaneously (A40's 48 GB fits 2–3 full SDXL models)
→SDXL + SDXL-Refiner pipeline (requires 16–20 GB for both models in memory)
→High-resolution generation at 2048×2048 or upscaling with ESRGAN loaded simultaneously

Cloud vs Local: When Does It Make Sense?

If you already own an RTX 3090 or 4090, running Stable Diffusion locally is usually cheaper for casual use (a few hundred images per week). The math flips for power users: generating 5,000+ images/week means your GPU is running nearly continuously, and a cloud RTX 4090 at $0.40/hr for 8 active hours/day costs $97/month — cheaper than the electricity plus amortized GPU cost of a high-end local workstation.

Usage Pattern	Best Approach	Estimated Monthly Cost
< 500 images/month, casual	Local RTX 3090/4090	~$5–10 (electricity)
500–5000 images/month, active hobbyist	Local or cloud RTX 4090 spot	$20–80/month (cloud)
5000+ images/month, power user	Cloud RTX 4090 (dedicated)	$80–200/month
Commercial batch generation (100K+/month)	Cloud A40 or L40S cluster	$300–1,500/month

Getting Started: Recommended Setups

→AUTOMATIC1111 / ComfyUI on RunPod: Use the pre-built 'Stable Diffusion' template — everything pre-installed, Jupyter + SSH access
→Vast.ai for cheapest RTX 4090: Filter by RTX 4090, reliability > 99%, then install ComfyUI via setup script
→JarvisLabs: Best for pure notebook users — pre-installed ComfyUI and A1111 with one-click start
→For batch pipelines: Use RunPod Serverless with a custom Docker image for pay-per-generation billing

Find the cheapest RTX 4090 cloud GPU for Stable Diffusion right nowCompare RTX 4090 Prices →