Characterizing Multimodal Long-form Summarization: A Case Study on Financial Reports

要約

大規模言語モデル (LLM) は長い入力を処理できるように自然言語処理の能力を拡張するため、その能力と動作を理解するには厳密かつ体系的な分析が必要です。
顕著な応用例は要約であり、その遍在性と論争のゆえに（例えば、研究者は要約の死を宣言した）。
財務報告書は長いだけでなく、数値や表が多用されるため、本稿では財務報告書の要約をケーススタディとして取り上げます。
我々は、マルチモーダルな長文要約を特徴付けるための計算フレームワークを提案し、Claude 2.0/2.1、GPT-4/3.5、およびコマンドの動作を調査します。
GPT-3.5 と Command はこの要約タスクを有意義に実行できていないことがわかりました。
Claude 2 と GPT-4 については、要約の抽出性を分析し、LLM の位置バイアスを特定します。
この位置の偏りは、クロードの入力をシャッフルすると消えます。これは、クロードが重要な情報を認識する能力を持っていることを示唆しています。
また、LLM が生成する要約における数値データの使用に関する包括的な調査を実施し、数値幻覚の分類を提供します。
私たちは GPT-4 の数値の使用を改善するために迅速なエンジニアリングを採用していますが、成功は限られています。
全体として、私たちの分析は、GPT-4 と比較して、長いマルチモーダル入力を処理する際の Claude 2 の強力な能力を強調しています。

要約(オリジナル)

As large language models (LLMs) expand the power of natural language processing to handle long inputs, rigorous and systematic analyses are necessary to understand their abilities and behavior. A salient application is summarization, due to its ubiquity and controversy (e.g., researchers have declared the death of summarization). In this paper, we use financial report summarization as a case study because financial reports not only are long but also use numbers and tables extensively. We propose a computational framework for characterizing multimodal long-form summarization and investigate the behavior of Claude 2.0/2.1, GPT-4/3.5, and Command. We find that GPT-3.5 and Command fail to perform this summarization task meaningfully. For Claude 2 and GPT-4, we analyze the extractiveness of the summary and identify a position bias in LLMs. This position bias disappears after shuffling the input for Claude, which suggests that Claude has the ability to recognize important information. We also conduct a comprehensive investigation on the use of numeric data in LLM-generated summaries and offer a taxonomy of numeric hallucination. We employ prompt engineering to improve GPT-4’s use of numbers with limited success. Overall, our analyses highlight the strong capability of Claude 2 in handling long multimodal inputs compared to GPT-4.

arxiv情報

著者	Tianyu Cao,Natraj Raman,Danial Dervovic,Chenhao Tan
発行日	2024-05-08 04:36:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Characterizing Multimodal Long-form Summarization: A Case Study on Financial Reports

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー