Abstract: |
Recent efforts have extended the flow-matching framework to discrete generative modeling. One strand of models directly works with the continuous probabilities instead of discrete tokens, which we colloquially refer to as Continuous-State Discrete Flow Matching (CS-DFM). Inspired by the mathematical theories of information geometry, we propose statistical flow matching that operates on the family of parameterized distributions with the canonical Riemannian geometry defined by the Fisher information metric. Furthermore, together with other CS-DFM models that have different geometric assumptions, we demonstrate that they can be unified under our a-flow framework, which operates on the canonical a-geometry of the statistical manifold. Theoretically, we show that the flow matching loss for a-flow establishes a unified variational bound for the discrete negative log-likelihood, with additional optimality guaranteed in a kinetic-optimal perspective. We comprehensively evaluate different instantiations of a-flow for unconditional protein generation, demonstrating their effectiveness and superior performance in terms of pLDDT scores and distribution fitness compared to existing Markov-chain-based DFM models. |
Biography: |
Chaoran Cheng is a fourth-year PhD candidate in Computer Science at UIUC, advised by Professor Ge Liu. His research interests primarily span multiple aspects of generative modeling, with a special theoretical interest in manifold learning, where desirable network behaviors are mathematically inspired and guaranteed. Chaoran's work extends to various generative tasks in AI4Science domains, combining continuous and discrete modalities with a more comprehensive and versatile generative model for downstream applications, including protein sequence-structure co-design and structure-based drug design. |