Stable Diffusion SDXL and Flux.1 have raised the bar for open-source image generation — but they've also raised the VRAM bar. SDXL at full quality needs 8–16 GB, and Flux.1 Dev pushes you to 24 GB or more. If your local GPU is struggling, cloud GPUs let you generate at full resolution without compromising quality or waiting for quantized workarounds.
What GPU Do You Need for Stable Diffusion?
VRAM is the primary constraint for image generation. SDXL (1024×1024 native resolution) needs at least 8 GB VRAM for basic generation and 12–16 GB for full quality without memory optimization tricks. Flux.1 Dev and Schnell are significantly more demanding — expect 20–24 GB for comfortable full-resolution generation at batch size 1.
GPU Specs for Stable Diffusion: RTX 4090 vs A40 vs L40S
| GPU | VRAM | Memory Bandwidth | SDXL Speed (img/min) | Flux.1 Dev Speed | Cloud Price |
|---|---|---|---|---|---|
| RTX 4090 | 24 GB GDDR6X | 1.0 TB/s | ~12–15 imgs/min | ~2–4 imgs/min | $0.35–0.74/hr |
| A40 | 48 GB GDDR6 | 0.7 TB/s | ~8–10 imgs/min | ~1.5–3 imgs/min | $0.79–1.49/hr |
| L40S | 48 GB GDDR6 | 0.9 TB/s | ~10–13 imgs/min | ~2–4 imgs/min | $1.49–2.49/hr |
| A100 80GB | 80 GB HBM2e | 2.0 TB/s | ~10–14 imgs/min | ~3–5 imgs/min | $1.49–2.20/hr |
| RTX 3090 | 24 GB GDDR6X | 0.9 TB/s | ~8–10 imgs/min | ~1.5–2.5 imgs/min | $0.20–0.44/hr |
The RTX 4090 is the standout for image generation: its GDDR6X bandwidth and Ada Lovelace architecture make it faster than much pricier data center GPUs for single-image generation. The A40 and L40S shine for batch workloads where the extra VRAM lets you run larger batches simultaneously.
Provider Pricing Comparison for Image Generation GPUs
| Provider | RTX 4090 Price | A40 Price | L40S Price | Notes |
|---|---|---|---|---|
| Vast.ai | $0.35–0.65/hr | $0.60–1.10/hr | Rare | Cheapest, peer-to-peer, variable reliability |
| RunPod | $0.74/hr on-demand | $0.79–0.99/hr | $1.79/hr | Reliable, good SD templates |
| RunPod (spot) | $0.35–0.55/hr | $0.45–0.79/hr | $0.89–1.49/hr | 40–70% cheaper, interruptible |
| Lambda Labs | Not available | Not available | Not available | Data center GPUs only |
| Paperspace | $0.45/hr | $0.76/hr | Not available | Gradient notebooks, easy SD setup |
| JarvisLabs | Not available | $0.89/hr | $1.49/hr | Pre-installed SD/ComfyUI templates |
Images Per Dollar: RTX 4090 Wins for Most Use Cases
For most Stable Diffusion users, the RTX 4090 at $0.35–0.50/hr on Vast.ai or RunPod spot delivers the best images-per-dollar. At 12 images/minute (20 steps, SDXL), you generate ~720 images/hour. At $0.50/hr, that's roughly $0.0007 per image — compared to Midjourney's ~$0.02 per image on the Pro plan.
When to Use an A40 or L40S Instead
- →Flux.1 Dev with ControlNet or IP-Adapter simultaneously (needs 30–40 GB VRAM)
- →Batch generation pipelines generating thousands of images at once
- →Running multiple SD models loaded simultaneously (A40's 48 GB fits 2–3 full SDXL models)
- →SDXL + SDXL-Refiner pipeline (requires 16–20 GB for both models in memory)
- →High-resolution generation at 2048×2048 or upscaling with ESRGAN loaded simultaneously
Cloud vs Local: When Does It Make Sense?
If you already own an RTX 3090 or 4090, running Stable Diffusion locally is usually cheaper for casual use (a few hundred images per week). The math flips for power users: generating 5,000+ images/week means your GPU is running nearly continuously, and a cloud RTX 4090 at $0.40/hr for 8 active hours/day costs $97/month — cheaper than the electricity plus amortized GPU cost of a high-end local workstation.
| Usage Pattern | Best Approach | Estimated Monthly Cost |
|---|---|---|
| < 500 images/month, casual | Local RTX 3090/4090 | ~$5–10 (electricity) |
| 500–5000 images/month, active hobbyist | Local or cloud RTX 4090 spot | $20–80/month (cloud) |
| 5000+ images/month, power user | Cloud RTX 4090 (dedicated) | $80–200/month |
| Commercial batch generation (100K+/month) | Cloud A40 or L40S cluster | $300–1,500/month |
Getting Started: Recommended Setups
- →AUTOMATIC1111 / ComfyUI on RunPod: Use the pre-built 'Stable Diffusion' template — everything pre-installed, Jupyter + SSH access
- →Vast.ai for cheapest RTX 4090: Filter by RTX 4090, reliability > 99%, then install ComfyUI via setup script
- →JarvisLabs: Best for pure notebook users — pre-installed ComfyUI and A1111 with one-click start
- →For batch pipelines: Use RunPod Serverless with a custom Docker image for pay-per-generation billing