ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles

要約

必要な属性を持つテキストコンテンツを自動的に生成することは、人々が長年追求してきた野心的なタスクです。
既存の研究では、ユニモーダル制御を言語モデル (LM) に組み込む点で一連の進歩が見られましたが、マルチモーダル信号を備えた制御可能な文を高効率で生成する方法は未解決の問題のままです。
このパズルに取り組むために、マルチモーダル信号を使用したゼロショット制御可能なテキスト生成の新しいパラダイム (\textsc{ZeroGen}) を提案します。
具体的には、 \textsc{ZeroGen} はテキストと画像の制御をトークンレベルからセンテンスレベルまで連続的に利用し、デコード時にそれらを統一確率空間にマッピングします。これにより、追加のトレーニングを行わずに加重加算によって LM 出力がカスタマイズされます。
より良い相互モードのトレードオフを達成するために、すべての制御の重みを調整するための効果的な動的重み付けメカニズムをさらに導入します。
さらに、私たちは、異なるモダリティからの信号間の深さまたは幅の関係を調査するために実質的な実験を実施します。
3 つの下流タスクに関する実証結果は、\textsc{ZeroGen} がキャプションタスクで同等のタスクを大幅に上回っているだけでなく、より高度な制御によるマルチモーダルニュース生成において大きな可能性を示していることを示しています。
コードは https://github.com/ImKeTT/ZeroGen でリリースされます。

要約(オリジナル)

Automatically generating textual content with desired attributes is an ambitious task that people have pursued long. Existing works have made a series of progress in incorporating unimodal controls into language models (LMs), whereas how to generate controllable sentences with multimodal signals and high efficiency remains an open question. To tackle the puzzle, we propose a new paradigm of zero-shot controllable text generation with multimodal signals (\textsc{ZeroGen}). Specifically, \textsc{ZeroGen} leverages controls of text and image successively from token-level to sentence-level and maps them into a unified probability space at decoding, which customizes the LM outputs by weighted addition without extra training. To achieve better inter-modal trade-offs, we further introduce an effective dynamic weighting mechanism to regulate all control weights. Moreover, we conduct substantial experiments to probe the relationship of being in-depth or in-width between signals from distinct modalities. Encouraging empirical results on three downstream tasks show that \textsc{ZeroGen} not only outperforms its counterparts on captioning tasks by a large margin but also shows great potential in multimodal news generation with a higher degree of control. Our code will be released at https://github.com/ImKeTT/ZeroGen.

arxiv情報

著者	Haoqin Tu,Bowen Yang,Xianfeng Zhao
発行日	2023-06-29 03:22:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー