TARFlow — Scalable Normalizing Flows

TL;DR

Normalizing flows are far more capable than believed: TarFlow — a simple, scalable Transformer-based normalizing flow — models and generates pixels directly, setting new state-of-the-art image likelihoods and, for the first time with a stand-alone NF, producing samples of diffusion-level quality.

Abstract

Normalizing Flows (NFs) are likelihood-based models for continuous inputs. They have demonstrated promising results on both density estimation and generative modeling tasks, but have received relatively little attention in recent years. In this work, we demonstrate that NFs are more powerful than previously believed. We present TarFlow: a simple and scalable architecture that enables highly performant NF models. TarFlow can be thought of as a Transformer-based variant of Masked Autoregressive Flows (MAFs): it consists of a stack of autoregressive Transformer blocks on image patches, alternating the autoregression direction between layers. TarFlow is straightforward to train end-to-end, and capable of directly modeling and generating pixels. We also propose three key techniques to improve sample quality: Gaussian noise augmentation during training, a post training denoising procedure, and an effective guidance method for both class-conditional and unconditional settings. Putting these together, TarFlow sets new state-of-the-art results on likelihood estimation for images, beating the previous best methods by a large margin, and generates samples with quality and diversity comparable to diffusion models, for the first time with a stand-alone NF model.

Method

Overview figure from the paper — see the linked paper for full details.

Key Contributions

A Transformer variant of Masked Autoregressive Flows: a stack of autoregressive Transformer blocks over image patches, alternating the autoregression direction between layers.

Three techniques that sharply improve sample quality — Gaussian noise augmentation during training, a post-training denoising procedure, and an effective guidance method.

New state-of-the-art likelihood estimation for images, with sample quality and diversity comparable to diffusion models from a stand-alone normalizing flow.

BibTeX

@inproceedings{zhai2025normalizing,
  title     = {Normalizing Flows are Capable Generative Models},
  author    = {Zhai, Shuangfei and Zhang, Ruixiang and Nakkiran, Preetum and Berthelot, David and Gu, Jiatao and Zheng, Huangjie and Chen, Tianrong and Bautista, Miguel Angel and Jaitly, Navdeep and Susskind, Joshua},
  booktitle = {International Conference on Machine Learning (ICML)},
  year      = {2025}
}