← Back to all projects

NFM

The Coupling Within: Flow Matching via Distilled Normalizing Flows

arXiv 2026

David Berthelot, Tianrong Chen, Jiatao Gu, Marco Cuturi, Laurent Dinh, Bhavik Chandna, Michal Klein, Joshua Susskind, Shuangfei Zhai

TL;DR

Normalized Flow Matching distills the quasi-deterministic noise↔data coupling of a pretrained autoregressive normalizing flow to train flow-matching students — outperforming independent and even optimal-transport couplings, and improving on the teacher flow itself.

Abstract

Flow models have rapidly become the go-to method for training and deploying large-scale generators, owing their success to inference-time flexibility via adjustable integration steps. A crucial ingredient in flow training is the choice of coupling measure for sampling noise/data pairs that define the flow matching (FM) regression loss. While FM training defaults usually to independent coupling, recent works show that adaptive couplings informed by noise/data distributions (e.g., via optimal transport, OT) improve both model training and inference. We radicalize this insight by shifting the paradigm: rather than computing adaptive couplings directly, we use distilled couplings from a different, pretrained model capable of placing noise and data spaces in bijection -- a property intrinsic to normalizing flows (NF) through their maximum likelihood and invertibility requirements. Leveraging recent advances in NF image generation via auto-regressive (AR) blocks, we propose Normalized Flow Matching (NFM), a new method that distills the quasi-deterministic coupling of pretrained NF models to train student flow models. These students achieve the best of both worlds: significantly outperforming flow models trained with independent or even OT couplings, while also improving on the teacher AR-NF model.

Method

NFM method figure

Overview figure from the paper — see the linked paper for full details.

Key Contributions

1

A paradigm shift: rather than computing adaptive couplings directly, distill couplings from a pretrained model that places noise and data in bijection — a property intrinsic to normalizing flows.

2

Normalized Flow Matching (NFM): distilling the quasi-deterministic coupling of pretrained autoregressive NF models into student flow models.

3

Students that beat flow models trained with independent or optimal-transport couplings, while also improving on the teacher AR-NF model.

BibTeX

@article{berthelot2026coupling,
  title   = {The Coupling Within: Flow Matching via Distilled Normalizing Flows},
  author  = {Berthelot, David and Chen, Tianrong and Gu, Jiatao and Cuturi, Marco and Dinh, Laurent and Chandna, Bhavik and Klein, Michal and Susskind, Joshua and Zhai, Shuangfei},
  journal = {arXiv preprint arXiv:2603.09014},
  year    = {2026}
}