Home  | News

20.11.2025

Teaser image to Zigzag Your Way to Faster, Smarter AI Image Generation

Zigzag Your Way to Faster, Smarter AI Image Generation

MCML Research Insight - With Vincent Tao Hu, Olga Grebenkova, Pingchuan Ma, Johannes Schusterbauer, and Björn Ommer

State-of-the-art diffusion models like DiT and Stable Diffusion have made AI image generation incredibly powerful. But they still struggle with one big issue: scaling to large images or videos quickly and efficiently without exhausting your GPU memory. What if we could process images faster, use less memory, and still retain visual quality—without sacrificing the model’s expressiveness?

The paper from MCML Junior Members Vincent Tao Hu, Olga Grebenkova, Pingchuan Ma, Johannes Schusterbauer, and MCML PI Björn Ommer, and collaborators Stefan Andreas Baumann, and Ming Gui, introduces ZigMa, a plug-and-play diffusion backbone that combines the strengths of DiT-style architectures and Mamba’s efficient long-range sequence modeling.

«Our Zigzag Mamba method injects spatial continuity into Mamba’s scanning process, enabling high-resolution image and video generation with lower complexity.»


Vincent Tao Hu et al.

MCML Junior Members

The Insight: Scan Like a Human, Not Like a CPU

Mamba is a new state-space model (SSM) that can model long sequences very efficiently. But most vision models treat 2D images like flat 1D sequences using computer-style scanning (row-by-row or column-by-column). This breaks spatial continuity—like reading a painting line-by-line instead of seeing the whole picture.

The authors realized that this scan order matters a lot. So they designed Zigzag Scanning: a spatially continuous path that mimics how humans process visual information—preserving neighborhoods and image structure layer by layer (see Figure 1).

By combining this zigzag path with Mamba blocks in a DiT-style backbone, the team created a model that’s both memory-efficient and faster—without sacrificing expressiveness.

The 2D Image Scan

Figure 1: The 2D Image Scan. Our mamba scan design is based on the sweep-scan scheme shown in subfigure (a). From this, we developed a zigzag-scan scheme displayed in subfigure (b) to enhance the continuity of the patches, thereby maximizing the potential of the Mamba block. Since there are several possible arrangements for these continuous scans, we have listed the eight most common zigzag-scans in subfigure (c).


«We introduce the notion of Order Receptive Field—the number of zigzag paths used across layers—to balance quality, speed, and memory.»


Vincent Tao Hu et al.

MCML Junior Members

Zigzag Mamba: A Drop-in Replacement for Transformers

The core component of ZigMa is a zero-parameter heterogeneous layerwise scan:

  • Each layer uses a different zigzag scan path from a pre-designed set (like S[i % 8])
  • The scan paths are spatially continuous and inductively aligned with natural images
  • No extra computation or parameters are introduced compared to standard Mamba
  • Cross-attention blocks are added for text conditioning, enabling T2I generation

This simple yet powerful design outperforms both Transformer-based (DiT, U-ViT) and SSM-based (VisionMamba) baselines on high-res datasets like FacesHQ (1024×1024) and MS COCO.


«Our method can consistently maintain GPU efficiency as the number of scan paths increases—unlike Transformer-based or k-direction Mamba models.»


Vincent Tao Hu et al.

MCML Junior Members

Why It Matters: High-Resolution, Low-Overhead Generation

ZigMa pushes the boundary of scalability in diffusion models:

  • Images: Trained on 1024×1024 CelebA-HQ+FFHQ (FacesHQ), ZigMa outperforms VisionMamba with higher batch sizes and better utilization
  • Videos: On UCF101, ZigMa achieves superior FVD scores with fewer parameters, using a factorized 3D Zigzag Mamba design
  • Speed & Memory: ZigMa requires ~40% less GPU memory than DiT or U-ViT while delivering higher FPS, even with 8 scan paths

Theoretical Backing: Stochastic Interpolant

ZigMa is not just about clever scanning—it’s built on a strong probabilistic foundation.

By adopting the Stochastic Interpolant framework [Albergo et al.], the model learns to estimate both the score function and the velocity vector for sampling under diffusion or flow-based formulations. This allows ZigMa to support both SDE and ODE samplers, enabling faster sampling without retraining.


TL;DR

ZigMa shows that how you scan matters. A simple change—like scanning in a zigzag—can drastically improve the performance of generative models. It’s a plug-and-play improvement that works out of the box with diffusion frameworks, scaling from images to videos, and retaining compatibility with text-conditioning.


Explore the Code, Paper & Demo

Check out the full paper appeared at ECCV 2024, one of the top-tier (A*) conferences in computer vision, where the team presents the complete method and results.

A* Conference
V. T. Hu • S. A. Baumann • M. Gui • O. GrebenkovaP. MaJ. SchusterbauerB. Ommer
ZigMa: A DiT-style Zigzag Mamba Diffusion Model.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI GitHub
Paper on arXiv
Code
Website

Share Your Research!


Get in touch with us!

Are you an MCML Junior Member and interested in showcasing your research on our blog?

We’re happy to feature your work—get in touch with us to present your paper.

#blog #research #ommer
Subscribe to RSS News feed

Related

Link to Anne-Laure Boulesteix Among the World’s Most Cited Researchers

13.11.2025

Anne-Laure Boulesteix Among the World’s Most Cited Researchers

MCML PI Anne‑Laure Boulesteix named Highly Cited Researcher 2025 for cross-field work, among 17 LMU scholars recognized globally.

Link to Björn Ommer Featured in Frankfurter Rundschau

13.11.2025

Björn Ommer Featured in Frankfurter Rundschau

Björn Ommer highlights how Google’s new AI search mode impacts publishers, content visibility, and the diversity of online information.

Link to Fabian Theis Among the World’s Most Cited Researchers

13.11.2025

Fabian Theis Among the World’s Most Cited Researchers

Fabian Theis is named a Highly Cited Researcher 2025 for his work in mathematical modeling of biological systems.

Link to Explaining AI Decisions: Shapley Values Enable Smart Exosuits

13.11.2025

Explaining AI Decisions: Shapley Values Enable Smart Exosuits

AI meets wearable robotics: MCML and Harvard researchers make exosuits smarter and safer with explainable optimization, presented at ECML-PKDD 2025.

Link to

10.11.2025

MCML at ICDM 2025

MCML researchers are represented with 2 papers at ICDM 2025.

Back to Top