Learning multi-modal generative models with permutation-invariant encoders and tighter variational bounds

要約

マルチモーダルデータのための深い潜在変数モデルを考案することは、機械学習研究における長年のテーマである。マルチモーダル変分オートエンコーダ（VAE）は、複数のモダリティを共同で説明する潜在表現を学習する、よく使われる生成モデルクラスである。このようなモデルのための様々な目的関数が提案されており、多くの場合、マルチモーダルデータの対数尤度の下界として、あるいは情報理論的考察から動機づけられている。異なるモダリティの部分集合から潜在変数を符号化するために、Product-of-Experts（PoE）またはMixture-of-Experts（MoE）集約スキームが日常的に使用されており、例えば、生成品質や複数のモダリティにわたる一貫性に関して、異なるトレードオフをもたらすことが示されている。本研究では、データの対数尤度を厳密に下限化できる変分境界を考察する。順列不変ニューラルネットワークに基づき、異なるモダリティから符号化された特徴を組み合わせることにより、PoEやMoEアプローチを一般化する、より柔軟な集約スキームを開発する。我々の数値実験により、マルチモダルの変分境界と様々な集約スキームのトレードオフを示す。識別可能なモデルにおいて、観測されたモダリティと潜在変数の真の結合分布を近似したい場合、より厳しい変分境界とより柔軟な集約モデルが有益になることを示す。

要約(オリジナル)

Devising deep latent variable models for multi-modal data has been a long-standing theme in machine learning research. Multi-modal Variational Autoencoders (VAEs) have been a popular generative model class that learns latent representations which jointly explain multiple modalities. Various objective functions for such models have been suggested, often motivated as lower bounds on the multi-modal data log-likelihood or from information-theoretic considerations. In order to encode latent variables from different modality subsets, Product-of-Experts (PoE) or Mixture-of-Experts (MoE) aggregation schemes have been routinely used and shown to yield different trade-offs, for instance, regarding their generative quality or consistency across multiple modalities. In this work, we consider a variational bound that can tightly lower bound the data log-likelihood. We develop more flexible aggregation schemes that generalise PoE or MoE approaches by combining encoded features from different modalities based on permutation-invariant neural networks. Our numerical experiments illustrate trade-offs for multi-modal variational bounds and various aggregation schemes. We show that tighter variational bounds and more flexible aggregation models can become beneficial when one wants to approximate the true joint distribution over observed modalities and latent variables in identifiable models.

arxiv情報

著者	Marcel Hirt,Domenico Campolo,Victoria Leong,Juan-Pablo Ortega
発行日	2023-09-01 10:32:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Learning multi-modal generative models with permutation-invariant encoders and tighter variational bounds

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー