Cosmos Predict2 Text to Image
Using Cosmos-Predict2 for text-to-image generation
If you find missing nodes when loading the workflow file below, it may be due to the following situations:
- You are not using the latest Development (Nightly) version of ComfyUI.
- You are using the Stable (Release) version or Desktop version of ComfyUI (which does not include the latest feature updates).
- You are using the latest Commit version of ComfyUI, but some nodes failed to import during startup.
Cosmos Predict2 Video2World Workflow
1. Workflow File
Please download the video below and drag it into ComfyUI to load the workflow. The workflow already has embedded model download links.Download Json Format Workflow File
Please download the following image as input:
2. Manual Model Installation
If the model download wasn’t successful, you can try to download them manually by yourself in this section. Diffusion model For other weights, please visit Cosmos_Predict2_repackaged to download Text encoder oldt5_xxl_fp8_e4m3fn_scaled.safetensors VAE wan_2.1_vae.safetensors File Storage Location3. Complete Workflow Step by Step

- Ensure the
Load Diffusion Model
node has loadedcosmos_predict2_2B_video2world_480p_16fps.safetensors
- Ensure the
Load CLIP
node has loadedoldt5_xxl_fp8_e4m3fn_scaled.safetensors
- Ensure the
Load VAE
node has loadedwan_2.1_vae.safetensors
- Upload the provided input image in the
Load Image
node - (Optional) If you need first and last frame control, use the shortcut
Ctrl(cmd) + B
to enable last frame input - (Optional) You can modify the prompts in the
ClipTextEncode
node - (Optional) Modify the size and frame count in the
CosmosPredict2ImageToVideoLatent
node - Click the
Run
button or use the shortcutCtrl(cmd) + Enter
to run the workflow - Once generation is complete, the video will automatically save to the
ComfyUI/output/
directory, you can also preview it in thesave video
node