ComfyLab
ComfyUI Upscale Workflow: Ultimate SD Upscale Setup (Free JSON Download)

ComfyUI Upscale Workflow: Ultimate SD Upscale Setup (Free JSON Download)

8GB VRAM VRAM Intermediate 12 min RealESRGAN_x4plus / 4x-UltraSharp
Savien

Generating 1024x1024 images feels impressive until you need a poster, a high-density display mockup, or professional print output. That’s when scaling directly to 4K (3840x2160) either crashes your GPU or produces artifacts and repetitive patterns. The solution isn’t brute force—it’s strategy using a ComfyUI upscale workflow that splits processing into manageable tiles.

ComfyUI offers multiple upscaling philosophies, each with tradeoffs. This guide walks you through the most reliable method: pixel-space upscaling with tiled processing, which lets an 8GB GPU accomplish what would normally demand 24GB of VRAM. You’ll learn which Super-Resolution models to use, how to configure the Ultimate SD Upscale node, and the critical VAE Decode trick that prevents crashes at the final step.

🏗️ Workflow: Tiled Upscale with Ultimate SD Upscale

🧠 VRAM: 8GB+ 📡 MODEL: Any checkpoint + ESRGAN upscale model

This exact graph was built and test-run end to end on a local ComfyUI instance (768x768 source, RealESRGAN_x4 upscale model, 13 tiles, Chess mode, denoise 0.35) — swap in the UNETLoader/CheckpointLoader for your own model and it runs as-is. It uses the same node structure described below, including the built-in tiled_decode option on Ultimate SD Upscale, which avoids the separate VAE Decode (Tiled) node.

At a Glance: ComfyUI Upscale Comparison

ApproachVRAM RequiredOutput QualitySpeedBest For
Direct 4K generation24GB+❌ Artifacts, crashesFastNot viable
Latent upscale only16GB+⚠️ Unstable, hallucinationMediumExperimental workflows
Pixel + refinement (hybrid)8GB✅ Sharp, faithfulMediumProfessional 4K output
Iterative upscaling8GB✅ Maximum detailSlowHigh-quality work

Why Direct 4K Generation Fails

Generating images directly at 4K resolution hits two fundamental limits. First, there’s pure VRAM exhaustion: a 4K latent representation (at the typical 8x compression used by SDXL and Stable Diffusion) still consumes enormous memory during the diffusion sampling loop. Most consumer GPUs run out of VRAM before the sampler completes even one step.

Then comes artifact multiplication. When a GPU is starved for memory, the sampler produces repetition, tiling artifacts, or malformed details. Characters may have double faces or bodies; textures repeat in obvious patterns; coherence breaks down entirely — even a high-end card can’t brute-force 4K generation without hitting one of these walls.

The workaround is to generate at a manageable resolution (1024x1024 works well with SDXL) and then upscale intelligently in pixel space, where memory requirements scale differently.

💡 Tip: Direct 4K generation crashes consumer GPUs due to VRAM limits and produces severe artifacts. Upscaling from a lower resolution is faster, more reliable, and produces better results.


Latent Upscale vs. Pixel Upscale: Which One?

ComfyUI supports two fundamentally different upscaling strategies. Understanding the tradeoff determines your entire ComfyUI upscale workflow.

Latent Upscale (Resize in Latent Space)

Latent upscaling resizes the compressed latent representation before decoding it to an image. The diffusion model then re-imagines the enlarged latent space with additional detail.

Advantages:

  • The AI actively generates new detail (skin pores, fabric weave, surface texture)
  • Can dramatically improve perceived sharpness and clarity
  • Feels like the model is “filling in” what wasn’t there before

Disadvantages:

  • Unstable; high denoise values cause the model to hallucinate and change the subject’s face, pose, or composition
  • Requires enormous VRAM—a 4K latent space is still massive
  • Prone to inconsistency across multiple passes
  • Difficult to predict the final output

Pixel Upscale (Super-Resolution Models)

Pixel upscaling converts the latent to a real image first, then passes it through a specialized upscaling neural network (like ESRGAN or SwinIR). The model amplifies existing pixels without generating new information.

Advantages:

  • Extremely faithful to the original image—shapes, colors, and composition remain exact
  • Stable and predictable; the same input always produces the same output
  • Works on any image, not just AI-generated ones
  • Much lower VRAM requirements

Disadvantages:

  • Cannot add information that wasn’t there; if the original is blurry, the upscale is a sharp blur
  • Cannot fix mistakes in the original generation
  • Depends entirely on the quality of the Super-Resolution model
Latent UpscalePixel Upscale
✅ Generates new detail actively✅ Extremely stable and predictable
✅ High perceived sharpness✅ Faithful to original composition
❌ Unstable at high denoise❌ Cannot add missing information
❌ Requires massive VRAM❌ Only as good as the source image
❌ Hallucination risk❌ Cannot fix generation mistakes

💡 Quick take: Pixel upscale is more stable and VRAM-efficient; latent upscale generates more detail but risks hallucination. The hybrid approach combines both for best results.


Best Practice: Hybrid Approach for 4K Upscale ComfyUI

The professional workflow combines both methods. Start with pixel upscaling to reach your target resolution (4K) using a Super-Resolution model. Then apply a refinement pass with a diffusion model at low denoise (0.2–0.35) to add fine detail and clean artifacts.

This preserves the original composition while adding the detail and polish that pure pixel upscaling lacks. The denoise stays low enough that the subject doesn’t change.


ESRGAN Models: Which One to Use?

ESRGAN-family models are the workhorses of pixel upscaling. Each has a different strength, and selecting the right one makes a visible difference in 4K upscale ComfyUI workflows.

ModelBest ForTrade-offs
4x-UltraSharpGeneral-purpose, balancedSharp without artificial contrast; works for almost any image type
RealESRGAN_x4plusPhotography, realistic texturesRecovers shadow detail well; slightly softer than UltraSharp
SwinIREdge preservation, fewer artifactsTransformer-based, slower, newer; fewer edge distortions on text and lines
Real-ESRGAN x2Moderate upscaling (2x)Lower memory, faster; use when 4x is overkill
R-ESRGAN 4x+ Anime6BAnime and stylized artSmoother results without unnecessary noise; specialized for non-photorealistic content

Installation: Download the .pth file and place it in ComfyUI/models/upscale_models/. ComfyUI auto-detects models in this folder.

For most workflows, 4x-UltraSharp is the safest default. It’s sharp without introducing artificial contrast, and it handles portraits, landscapes, and stylized art equally well.

📌 Keep in mind: 4x-UltraSharp is the best general-purpose model; use RealESRGAN_x4plus for photography and R-ESRGAN 4x+ Anime6B for anime. Test on a small sample if quality is critical.


The Ultimate SD Upscale Node: Tiled Processing Explained

The Ultimate SD Upscale node (from Coyote-A’s node pack) is the key to scaling beyond your GPU’s limits. It implements tiled upscaling: the image is split into overlapping tiles (typically 512px), each tile is processed separately by the diffusion model, and then the tiles are stitched together seamlessly.

An 8GB GPU can thus handle work that would normally require 24GB.

Critical Configuration Parameters

upscale_by: The multiplier. Set to 4 if starting from 1024px to reach 4K. If you’ve already pixel-upscaled in a prior step, set this to 1 (no further scaling, just refinement).

upscale_model: The Super-Resolution model to use before the sampler. Select 4x-UltraSharp for balanced results.

mode_type: Determines tile processing order. Chess (checkerboard pattern) is optimal—it processes tiles in a pattern that makes seams invisible because the sampler sees context from adjacent tiles.

denoise: The most critical value. This controls how much the diffusion model re-imagines each tile:

  • 0.2–0.3: Cleaning and sharpening only; minimal change to the original
  • 0.35–0.45: Adds new detail and texture; recommended for most workflows
  • 0.5+: The image starts changing shape unpredictably; faces and backgrounds may shift

For upscaling, keep denoise between 0.35 and 0.45 unless you specifically want to alter the original image.

tile_padding: Extra margin around each tile so the sampler sees adjacent tile context. Increase to 32 or 64 to reduce visible seams.

⚠️ Important: Set denoise between 0.35–0.45, use Chess mode, and increase tile_padding to 32–64 to eliminate seams and artifacts in your Ultimate SD Upscale ComfyUI workflow.


Solving the Final VRAM Crash: Tiled Decode

Even with Ultimate SD Upscale working perfectly, ComfyUI can still crash at the very end if the decode step tries to process the entire 4K image in one pass instead of tile by tile.

Solution: The Ultimate SD Upscale node has a tiled_decode toggle built directly into it — enable it and each tile gets decoded separately instead of decoding the full output in one shot. VRAM usage stays roughly constant regardless of output resolution, which is what makes 8K or 16K output feasible on modest hardware. (Older workflows sometimes wire in a separate VAE Decode (Tiled) node from the Impact Pack for the same effect — functionally equivalent, but the built-in toggle is simpler if your node version has it.)

This single setting often means the difference between a working 4K workflow and an out-of-memory crash.


Follow this sequence for reliable 4K upscaling:

Step 1: Generate base image Generate at 1024x1024 using SDXL with your desired prompt and sampling settings.

Step 2: Pixel upscale (optional but recommended) Use Upscale Image (using Model) with 4x-UltraSharp to scale to 4K. This gives the refinement pass a sharp, high-detail base.

Step 3: Ultimate SD Upscale refinement

  • Set upscale_by=1 (already scaled in Step 2)
  • Set denoise=0.35 (above 0.4 causes unwanted face/background changes)
  • Use the same prompt as the original
  • Set mode_type=Chess and tile_padding=32

Step 4: Enable tiled decode Turn on the tiled_decode option on the Ultimate SD Upscale node itself to avoid VRAM crashes at the final step.

This workflow keeps VRAM usage manageable while producing sharp, detailed 4K output with minimal artifacts.


Avoiding Seams and Artifacts

Visible seams between tiles or double-face/double-body artifacts signal configuration problems. Seams appear when tile padding is too low—increase to 32 or 64. Mask blur can also be insufficient; bump it up to 8–16. Verify that mode_type=Chess so the sampler sees adjacent tile context.

Double features appear when denoise is too high; the sampler tries to generate a full image inside a small tile, so lower denoise below 0.4. Another culprit: the upscale model isn’t selected. When the sampler works on interpolated pixels, errors multiply fast.

💡 Tip: Add a ControlNet Tile pass after Ultimate SD Upscale for advanced seam reduction. This forces the sampler to respect the original image’s structure per tile, nearly eliminating seams.


Iterative Upscaling for Maximum Quality

Jumping straight from 1024px to 4K in one pass is fast but produces fewer details and more errors than iterative upscaling.

Better approach:

  1. Upscale from 1024px to 2K using Ultimate SD Upscale with denoise=0.35
  2. Refine at 2K (optional but recommended)
  3. Upscale from 2K to 4K using Ultimate SD Upscale with denoise=0.3 (lower, since detail is already present)
  4. Final refinement at 4K

Each pass adds detail and reduces artifacts. The tradeoff is time — iterative upscaling takes noticeably longer than a direct one-pass approach, since you’re running the refinement sampler twice.


Video Upscaling: The Same Nodes, Different Challenges

The same Ultimate SD Upscale and tiled VAE Decode nodes work for video, but the time cost is proportional. A 100-frame video takes roughly as long as upscaling 100 single images sequentially.

For professional video upscaling, dedicated tools or iterative upscaling techniques (upscaling every other frame, then interpolating) are faster than processing each frame individually through ComfyUI.


FAQ

Q: What’s the real difference between Latent and Pixel Upscale?

A: Latent Upscale stretches the latent representation before it becomes an image, generating lots of new detail but potentially distorting composition. Pixel Upscale scales the finished image using a specialized AI model (like ESRGAN) and then refines it, staying much more faithful to the original.

Q: How do I stop my GPU from running out of memory on a 4K image?

A: The key is tiled VAE and tile-based upscaling. Instead of processing the whole 4K image at once, nodes like Ultimate SD Upscale split it into 512px or 1024px tiles and process them one at a time—this lets an 8GB GPU do work that would normally require 24GB.

Q: Why do I see grid lines or seams in my upscaled image?

A: This happens from insufficient padding or too-high denoise. Increase Tile Padding to 32 or 64 in Ultimate SD Upscale and keep denoise between 0.3 and 0.4. The ‘Chess’ mode also helps blend these seams.

Q: Which upscale model is best for realistic photography?

A: For realism, 4x-UltraSharp and RealESRGAN_x4plus are the industry standards. For anime, R-ESRGAN 4x+ Anime6B gives smoother results without introducing unnecessary noise.


Keep Reading

The denoise, steps and sampler settings used here follow the same logic covered in our KSampler explained guide — worth reading if any of those parameters felt unfamiliar. If your GPU can’t handle 4K tiles even with this workflow, our best GPU for ComfyUI guide covers what to upgrade to.


🏆 Our Recommendation

If you’re upscaling for professional print or display work → go with the hybrid approach (pixel upscale + low-denoise refinement). If you prioritize speed and have limited time → use direct pixel upscaling with 4x-UltraSharp and skip the refinement pass. If you need maximum detail and don’t mind waiting → use iterative upscaling (1K→2K→4K). Start with the recommended workflow order above, test on a single image first, and adjust denoise and tile_padding based on your specific GPU and output requirements.

FAQ

What's the real difference between Latent and Pixel Upscale?
Latent Upscale stretches the latent representation before it becomes an image, generating lots of new detail but potentially distorting composition. Pixel Upscale scales the finished image using a specialized AI model (like ESRGAN) and then refines it, staying much more faithful to the original.
How do I stop my GPU from running out of memory on a 4K image?
The key is tiled VAE and tile-based upscaling. Instead of processing the whole 4K image at once, nodes like Ultimate SD Upscale split it into 512px or 1024px tiles and process them one at a time -- this lets an 8GB GPU do work that would normally require 24GB.
Why do I see grid lines or seams in my upscaled image?
This happens from insufficient padding or too-high denoise. Increase Tile Padding to 32 or 64 in Ultimate SD Upscale and keep denoise between 0.3 and 0.4. The 'Chess' mode also helps blend these seams.
Which upscale model is best for realistic photography?
For realism, 4x-UltraSharp and RealESRGAN_x4plus are the industry standards. For anime, R-ESRGAN 4x+ Anime6B gives smoother results without introducing unnecessary noise.
Share X LinkedIn

You may also like