A General Retrieval-Augmented Generation Framework for Multimodal Case-Based Reasoning Applications

要約

ケースベース推論 (CBR) は、問題解決に対する経験ベースのアプローチであり、解決されたケースのリポジトリが新しいケースの解決に適応されます。
最近の研究では、検索拡張生成 (RAG) を備えた大規模言語モデル (LLM) が、類似のケースを取得し、LLM クエリへの追加コンテキストとして使用することで、CBR パイプラインの取得および再利用ステージをサポートできることが示されています。
ほとんどの研究はテキストのみのアプリケーションに焦点を当てていますが、現実世界の問題の多くでは、ケースの構成要素はマルチモーダルです。
この論文では、マルチモーダル CBR アプリケーションのための一般的な RAG フレームワークである MCBR-RAG を紹介します。
MCBR-RAG フレームワークは、非テキストケースコンポーネントをテキストベースの表現に変換し、次のことを可能にします。1) 検索用にインデックスを作成できるアプリケーション固有の潜在表現を学習し、2) すべてのケースを組み込むことで、LLM に提供されるクエリを強化します。
コンテキストを改善するためのコンポーネント。
簡略化された Math-24 アプリケーションとより複雑なバックギャモンアプリケーションで行われた実験を通じて、MCBR-RAG の有効性を実証します。
私たちの経験的結果は、MCBR-RAG がコンテキスト情報が提供されないベースライン LLM と比較して生成品質を向上させることを示しています。

要約(オリジナル)

Case-based reasoning (CBR) is an experience-based approach to problem solving, where a repository of solved cases is adapted to solve new cases. Recent research shows that Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) can support the Retrieve and Reuse stages of the CBR pipeline by retrieving similar cases and using them as additional context to an LLM query. Most studies have focused on text-only applications, however, in many real-world problems the components of a case are multimodal. In this paper we present MCBR-RAG, a general RAG framework for multimodal CBR applications. The MCBR-RAG framework converts non-text case components into text-based representations, allowing it to: 1) learn application-specific latent representations that can be indexed for retrieval, and 2) enrich the query provided to the LLM by incorporating all case components for better context. We demonstrate MCBR-RAG’s effectiveness through experiments conducted on a simplified Math-24 application and a more complex Backgammon application. Our empirical results show that MCBR-RAG improves generation quality compared to a baseline LLM with no contextual information provided.

arxiv情報

著者	Ofir Marom
発行日	2025-01-09 07:41:22+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A General Retrieval-Augmented Generation Framework for Multimodal Case-Based Reasoning Applications

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー