Model and Framework Support Matrix#
MindIE SD currently supports the vLLM Omni framework, the Cache DiT framework, and the Modelers community. In theory, MindIE SD can accelerate inference for any multimodal model, but the matrix below lists the representative models and feature combinations that are currently supported.
Model support#
Model |
vLLM Omni |
Cache DiT + diffusers |
Modelers community |
|---|---|---|---|
Stable Diffusion 1.5 |
✖️ |
✖️ |
✅ |
Stable Diffusion 2.1 |
✖️ |
✖️ |
✅ |
Stable Diffusion XL |
✖️ |
✖️ |
✅ |
Stable Diffusion XL_inpainting |
✖️ |
✖️ |
✅ |
Stable Diffusion XL_lighting |
✖️ |
✖️ |
✅ |
Stable Diffusion XL_controlnet |
✖️ |
✖️ |
✅ |
Stable Diffusion XL_prompt_weight |
✖️ |
✖️ |
✅ |
Stable Diffusion 3 |
✖️ |
✖️ |
✅ |
Stable Video Diffusion |
✖️ |
✖️ |
✅ |
Stable Audio Open v1.0 |
✖️ |
✖️ |
✅ |
OpenSora v1.2 |
✖️ |
✖️ |
✅ |
OpenSoraPlan v1.2 |
✖️ |
✖️ |
✅ |
OpenSoraPlan v1.3 |
✖️ |
✖️ |
✅ |
CogView3-Plus-3B |
✖️ |
✖️ |
✅ |
CogVideoX-2B |
✖️ |
✖️ |
✅ |
CogVideoX-5B |
✖️ |
✖️ |
✅ |
HunyuanDit |
✖️ |
✖️ |
✅ |
HunyuanVideo |
✖️ |
✖️ |
✅ |
HunyuanVideo-1.5 |
✖️ |
✖️ |
✅ |
Hunyuan3D-2.1 |
✖️ |
✖️ |
✅ |
Wan2.1 |
✖️ |
✖️ |
✅ |
Wan2.2 |
✖️ |
✖️ |
✅ |
FLUX.1-dev |
✅ |
✅ |
✅ |
FLUX.2-dev |
✖️ |
✅ |
✅ |
Qwen-Image |
✅ |
✖️ |
✅ |
Qwen-Image-Edit |
✅ |
✖️ |
✅ |
Qwen-Image-Edit-2509 |
✅ |
✖️ |
✅ |
Z-Image |
✖️ |
✖️ |
✅ |
Z-Image-Turbo |
✅ |
✖️ |
✅ |
vLLM Omni features and model performance#
Model |
Hardware |
Cache |
Parallelism |
Sparse FA |
Quantization |
Fused operators |
|---|---|---|---|---|---|---|
FLUX.1-dev |
Atlas 800I A2 server |
✅ |
✅ |
✖️ |
✅ |
✅ |
Qwen-Image |
Atlas 800I A2 server |
✅ |
✅ |
✖️ |
✖️ |
✅ |
Qwen-Image-Edit |
Atlas 800I A2 server |
✅ |
✅ |
✖️ |
✖️ |
✅ |
Qwen-Image-Edit-2509 |
Atlas 800I A2 server |
✅ |
✅ |
✖️ |
✖️ |
✅ |
Z-Image-Turbo |
Atlas 800I A2 server |
✅ |
✖️ |
✖️ |
✖️ |
✅ |
Note Atlas 800I A2 servers use 313T default compute and 64 GB of memory.
Cache DiT + diffusers features and model performance#
Model |
Hardware |
Cache |
Parallelism |
Sparse FA |
Quantization |
Fused operators |
|---|---|---|---|---|---|---|
FLUX.1-dev |
Atlas 800I A2 server |
✅ |
✅ |
✖️ |
✅ |
✅ |
FLUX.2-dev |
Atlas 800I A2 server |
✖️ |
✅ |
✖️ |
✖️ |
✅ |
Modelers community feature combinations and model performance#
Model |
Hardware |
Cache |
Parallelism |
Sparse FA |
Quantization |
Fused operators |
Notes |
|---|---|---|---|---|---|---|---|
|
✅ |
✅ |
✖️ |
✖️ |
✅ |
None |
|
|
✅ |
✅ |
✖️ |
✖️ |
✅ |
None |
|
|
✅ |
✅ |
✖️ |
✖️ |
✅ |
None |
|
|
✅ |
✖️ |
✖️ |
✖️ |
✅ |
Functional integration complete |
|
|
✅ |
✖️ |
✖️ |
✖️ |
✅ |
Functional integration complete |
|
|
✅ |
✖️ |
✖️ |
✖️ |
✅ |
Functional integration complete |
|
|
✅ |
✖️ |
✖️ |
✖️ |
✅ |
Functional integration complete |
|
|
✅ |
✅ |
✖️ |
✖️ |
✅ |
None |
|
Atlas 800I A2 server |
✅ |
✅ |
✖️ |
✖️ |
✅ |
None |
|
|
✅ |
✖️ |
✖️ |
✖️ |
✅ |
None |
|
|
✅ |
✅ |
✖️ |
✖️ |
✅ |
None |
|
|
✅ |
✅ |
✖️ |
✖️ |
✅ |
None |
|
Atlas 800I A2 server |
✅ |
✅ |
✖️ |
✖️ |
✅ |
None |
|
|
✅ |
✅ |
✖️ |
✖️ |
✅ |
None |
|
|
✅ |
✅ |
✖️ |
✖️ |
✅ |
None |
|
|
✅ |
✅ |
✖️ |
✖️ |
✅ |
None |
|
|
✅ |
✅ |
✖️ |
✅ |
✅ |
None |
|
|
✅ |
✅ |
✖️ |
✅ |
✅ |
None |
|
|
✅ |
✖️ |
✖️ |
✖️ |
✅ |
None |
|
|
✅ |
✅ |
✖️ |
✅ |
✅ |
None |
|
|
✅ |
✅ |
✅ |
✅ |
✅ |
None |
|
|
✅ |
✅ |
✖️ |
✅ |
✅ |
None |
|
|
✅ |
✅ |
✅ |
✅ |
✅ |
None |
|
|
✅ |
✅ |
✅ |
✅ |
✅ |
None |
|
|
✅ |
✅ |
✖️ |
✅ |
✅ |
None |
|
|
✅ |
✅ |
✖️ |
✅ |
✅ |
None |
|
|
✅ |
✅ |
✖️ |
✅ |
✅ |
None |
|
|
✖️ |
✖️ |
✖️ |
✖️ |
✖️ |
None |
|
|
✖️ |
✖️ |
✖️ |
✖️ |
✅ |
None |
Note
Atlas 300I DUO inference cards use 280T default compute and 48 GB of memory.
Atlas 800I A2 servers use 313T default compute and 64 GB of memory.
Atlas 800I A3 supernode servers use 560T default compute and 64 GB of memory.