News Reporter: A Multi-lingual LLM Framework for Broadcast T.V News

要約

大規模言語モデル (LLM) は、さまざまなクエリに対して一貫した回答を提供できるため、多くの会話型チャットボットにとって急速に不可欠なツールになりました。
これらの LLM のトレーニングに使用されるデータセットは、一般的なサンプルと合成サンプルが混在していることが多いため、テレビニュースに正確で検証可能な回答を提供するために必要な検証が不足しています。
私たちは、米国中のさまざまなニュースチャンネルからのニュース録音のトランスクリプトから抽出された QA ペアの大規模なコレクションを収集し、共有しています。
結果の QA ペアは、既製の LLM モデルを微調整するために使用されます。
私たちのモデルは、いくつかのオープン LLM ベンチマークで同様のサイズの基本モデルを上回っています。
さらに、RAG 手法を統合して提案し、回答の文脈化を改善し、検証可能なニュース記録を指し示します。

要約(オリジナル)

Large Language Models (LLMs) have fast become an essential tools to many conversational chatbots due to their ability to provide coherent answers for varied queries. Datasets used to train these LLMs are often a mix of generic and synthetic samples, thus lacking the verification needed to provide correct and verifiable answers for T.V. News. We collect and share a large collection of QA pairs extracted from transcripts of news recordings from various news-channels across the United States. Resultant QA pairs are then used to fine-tune an off-the-shelf LLM model. Our model surpasses base models of similar size on several open LLM benchmarks. We further integrate and propose a RAG method to improve contextualization of our answers and also point it to a verifiable news recording.

arxiv情報

著者	Tarun Jain,Yufei Gao,Sridhar Vanga,Karan Singla
発行日	2024-11-06 16:17:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

News Reporter: A Multi-lingual LLM Framework for Broadcast T.V News

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー