jarxiv | Japanese arxiv | ページ 852

C-MTCSD: A Chinese Multi-Turn Conversational Stance Detection Dataset

投稿日: 2025年4月21日作成者: jarxiv

要約

スタンス検出は、ソーシャルメディアでの公開ディスカッションを分析するための不可欠なツールとなっています。
現在の方法は、特に中国語の処理と多ターン会話分析において、重大な課題に直面しています。
これらの制限に対処するために、中国の最大のマルチターン会話スタンス検出データセットであるC-MTCSDを導入します。これは、中国の唯一の会話スタンス検出データセットの4.2倍大きいSina Weiboからの24,264個の慎重に注釈付きインスタンスを慎重に注釈しました。
従来のアプローチと大規模な言語モデルの両方を使用した当社の包括的な評価は、C-MTCSDの複雑さを明らかにしています。最先端のモデルでさえ、困難なゼロショット設定で64.07％F1スコアのみを達成し、パフォーマンスは一貫して会話の深さを高めます。
従来のモデルは、特に暗黙のスタンス検出と格闘し、50％F1スコアを達成しています。
この作業は、中国のスタンス検出研究のための挑戦的な新しいベンチマークを確立し、将来の改善のための重要な機会を強調しています。

要約(オリジナル)

Stance detection has become an essential tool for analyzing public discussions on social media. Current methods face significant challenges, particularly in Chinese language processing and multi-turn conversational analysis. To address these limitations, we introduce C-MTCSD, the largest Chinese multi-turn conversational stance detection dataset, comprising 24,264 carefully annotated instances from Sina Weibo, which is 4.2 times larger than the only prior Chinese conversational stance detection dataset. Our comprehensive evaluation using both traditional approaches and large language models reveals the complexity of C-MTCSD: even state-of-the-art models achieve only 64.07% F1 score in the challenging zero-shot setting, while performance consistently degrades with increasing conversation depth. Traditional models particularly struggle with implicit stance detection, achieving below 50% F1 score. This work establishes a challenging new benchmark for Chinese stance detection research, highlighting significant opportunities for future improvements.

arxiv情報

著者	Fuqiang Niu,Yi Yang,Xianghua Fu,Genan Dai,Bowen Zhang
発行日	2025-04-18 16:44:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL | コメントを受け付けていません

SysCaps: Language Interfaces for Simulation Surrogates of Complex Systems

投稿日: 2025年4月21日作成者: jarxiv

要約

代理モデルは、従来の数値的手法でシミュレートするには高すぎる複雑なエネルギーシステムの動作を予測するために使用されます。
私たちの研究では、「システムキャプション」またはSyscapsと呼ばれる言語の説明の使用を紹介して、そのような代理人とのインターフェースを紹介します。
私たちは、テキスト、特に自然言語を介して代理人と対話することにより、これらのモデルが専門家と非専門家の両方でよりアクセスしやすくなると主張しています。
軽量のマルチモーダルテキストとタイムリーの回帰モデルと、シミュレーションメタデータの高品質のキャプションを合成するために大規模な言語モデル（LLM）を使用するトレーニングパイプラインを紹介します。
建物と風力発電所の2つの現実世界のシミュレーターでの実験は、Syscapsが昇給した代理人が、同じテストシステムの意味的に関連する説明を処理するなど、新しい一般化能力を享受しながら、従来の方法よりも保有システムの精度が高いことを示しています。
また、追加の実験は、言語主導の設計スペース探査のロックを解除し、迅速な増強を通じてトレーニングを正規化するSyscapsの可能性を強調しています。

要約(オリジナル)

Surrogate models are used to predict the behavior of complex energy systems that are too expensive to simulate with traditional numerical methods. Our work introduces the use of language descriptions, which we call “system captions” or SysCaps, to interface with such surrogates. We argue that interacting with surrogates through text, particularly natural language, makes these models more accessible for both experts and non-experts. We introduce a lightweight multimodal text and timeseries regression model and a training pipeline that uses large language models (LLMs) to synthesize high-quality captions from simulation metadata. Our experiments on two real-world simulators of buildings and wind farms show that our SysCaps-augmented surrogates have better accuracy on held-out systems than traditional methods while enjoying new generalization abilities, such as handling semantically related descriptions of the same test system. Additional experiments also highlight the potential of SysCaps to unlock language-driven design space exploration and to regularize training through prompt augmentation.

arxiv情報

著者	Patrick Emami,Zhaonan Li,Saumya Sinha,Truc Nguyen
発行日	2025-04-18 16:49:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL, cs.LG, cs.SY, eess.SY | コメントを受け付けていません

From Token to Line: Enhancing Code Generation with a Long-Term Perspective

投稿日: 2025年4月21日作成者: jarxiv

要約

大規模な言語モデル（LLMS）の出現は、コード生成タスクの開発を大幅に促進し、関連する文献の急増を引き起こしました。
現在の研究は、冗長生成の結果と短期的にローカルパターンを過剰に抑える傾向によって妨げられています。
既存の研究は、マルチトークン予測戦略を採用することにより問題を軽減しようとしますが、世代に適した処理長を選択することに焦点が留められています。
LLMSの生成プロセス中のトークン間の注意を分析することにより、注意スコアの高いスパイクが通常、行の最後に表示されることが観察できます。
この洞察は、コードの各行を基本処理装置として扱い、それらを連続的に生成することが合理的であることを示唆しています。
これに触発されて、\ textbf {lsr-mcts}アルゴリズムを提案します。これは、MCTを活用してコードラインバイラインを決定し、最適なパスを選択します。
さらに、各ノードで自己反復メカニズムを統合して、多様性を高め、エラー修正を通じて高品質のプログラムを生成します。
3つのパブリックコーディングベンチマークでの広範な実験と包括的な分析は、この方法が最先端のパフォーマンスアプローチを上回ることを示しています。

要約(オリジナル)

The emergence of large language models (LLMs) has significantly promoted the development of code generation task, sparking a surge in pertinent literature. Current research is hindered by redundant generation results and a tendency to overfit local patterns in the short term. Although existing studies attempt to alleviate the issue by adopting a multi-token prediction strategy, there remains limited focus on choosing the appropriate processing length for generations. By analyzing the attention between tokens during the generation process of LLMs, it can be observed that the high spikes of the attention scores typically appear at the end of lines. This insight suggests that it is reasonable to treat each line of code as a fundamental processing unit and generate them sequentially. Inspired by this, we propose the \textbf{LSR-MCTS} algorithm, which leverages MCTS to determine the code line-by-line and select the optimal path. Further, we integrate a self-refine mechanism at each node to enhance diversity and generate higher-quality programs through error correction. Extensive experiments and comprehensive analyses on three public coding benchmarks demonstrate that our method outperforms the state-of-the-art performance approaches.

arxiv情報

著者	Tingwei Lu,Yangning Li,Liyuan Wang,Binghuai Lin,Jiwei Tang,Wanshi Xu,Hai-Tao Zheng,Yinghui Li,Bingxu An,Zhao Wei,Yong Xu
発行日	2025-04-18 17:03:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL | コメントを受け付けていません

A-MEM: Agentic Memory for LLM Agents

投稿日: 2025年4月21日作成者: jarxiv

要約

大規模な言語モデル（LLM）エージェントは、複雑な実世界のタスクに外部ツールを効果的に使用できますが、歴史的な経験を活用するためにメモリシステムが必要です。
現在のメモリシステムは、基本的なストレージと検索を可能にしますが、グラフデータベースを組み込もうとする最近の試みにもかかわらず、洗練されたメモリ組織を欠いています。
さらに、これらのシステムの固定操作と構造は、多様なタスク全体で適応性を制限しています。
この制限に対処するために、このペーパーでは、エージェントの方法で記憶を動的に整理できるLLMエージェント向けの新しいエージェントメモリシステムを提案します。
Zettelkastenメソッドの基本原則に従って、動的なインデックス作成とリンクを通じて相互接続された知識ネットワークを作成するようにメモリシステムを設計しました。
新しいメモリが追加されると、コンテキストの説明、キーワード、タグなど、複数の構造化された属性を含む包括的なメモを生成します。
次に、システムは歴史的な記憶を分析して、関連する接続を特定し、意味のある類似性が存在するリンクを確立します。
さらに、このプロセスにより、メモリの進化が可能になります。新しいメモリが統合されると、既存の歴史的記憶のコンテキスト表現と属性の更新をトリガーでき、メモリネットワークがその理解を継続的に改良することができます。
私たちのアプローチは、Zettelkastenの構造化された組織原則を、エージェント主導の意思決定の柔軟性と組み合わせて、より適応的でコンテキスト認識したメモリ管理を可能にします。
6つの基礎モデルでの経験的実験は、既存のSOTAベースラインに対する優れた改善を示しています。
パフォーマンスを評価するためのソースコードは、https：//github.com/wujiangxu/agenticmemoryで入手できますが、エージェントメモリシステムのソースコードはhttps://github.com/agiresearch/a-memで入手できます。

要約(オリジナル)

While large language model (LLM) agents can effectively use external tools for complex real-world tasks, they require memory systems to leverage historical experiences. Current memory systems enable basic storage and retrieval but lack sophisticated memory organization, despite recent attempts to incorporate graph databases. Moreover, these systems’ fixed operations and structures limit their adaptability across diverse tasks. To address this limitation, this paper proposes a novel agentic memory system for LLM agents that can dynamically organize memories in an agentic way. Following the basic principles of the Zettelkasten method, we designed our memory system to create interconnected knowledge networks through dynamic indexing and linking. When a new memory is added, we generate a comprehensive note containing multiple structured attributes, including contextual descriptions, keywords, and tags. The system then analyzes historical memories to identify relevant connections, establishing links where meaningful similarities exist. Additionally, this process enables memory evolution – as new memories are integrated, they can trigger updates to the contextual representations and attributes of existing historical memories, allowing the memory network to continuously refine its understanding. Our approach combines the structured organization principles of Zettelkasten with the flexibility of agent-driven decision making, allowing for more adaptive and context-aware memory management. Empirical experiments on six foundation models show superior improvement against existing SOTA baselines. The source code for evaluating performance is available at https://github.com/WujiangXu/AgenticMemory, while the source code of agentic memory system is available at https://github.com/agiresearch/A-mem.

arxiv情報

著者	Wujiang Xu,Kai Mei,Hang Gao,Juntao Tan,Zujie Liang,Yongfeng Zhang
発行日	2025-04-18 17:26:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL, cs.HC | コメントを受け付けていません

Analyzing LLMs’ Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations

投稿日: 2025年4月21日作成者: jarxiv

要約

LLMSの知識の境界を理解することは幻覚を防ぐために重要ですが、LLMSの知識境界に関する研究は主に英語に焦点を当てています。
この作業では、LLMが複数の言語で既知の質問と不明な質問を処理するときに内部表現を調査することにより、LLMが異なる言語にわたって知識の境界をどのように認識するかを分析する最初の研究を提示します。
私たちの経験的研究は、3つの重要な調査結果を明らかにしています。1）LLMSの知識の境界に関する認識は、異なる言語の中央から中流層でエンコードされています。
2）知識の境界知覚の言語の違いは、線形構造に従います。これは、言語間で知識の境界知覚能力を効果的に転送するトレーニングフリーアライメント方法の提案を動機付け、それによって低リソース言語の幻覚リスクを減らすのに役立ちます。
3）バイリンガルの質問ペア翻訳の微調整は、言語間の知識境界のLLMの認識をさらに強化します。
言語間知識境界分析のための標準テストベッドがないことを考えると、3つの代表的なタイプの知識境界データで構成される多言語評価スイートを構築します。
コードとデータセットは、https://github.com/damo-nlp-sg/llm-multingual-knowledge-boundariesで公開されています。

要約(オリジナル)

While understanding the knowledge boundaries of LLMs is crucial to prevent hallucination, research on knowledge boundaries of LLMs has predominantly focused on English. In this work, we present the first study to analyze how LLMs recognize knowledge boundaries across different languages by probing their internal representations when processing known and unknown questions in multiple languages. Our empirical studies reveal three key findings: 1) LLMs’ perceptions of knowledge boundaries are encoded in the middle to middle-upper layers across different languages. 2) Language differences in knowledge boundary perception follow a linear structure, which motivates our proposal of a training-free alignment method that effectively transfers knowledge boundary perception ability across languages, thereby helping reduce hallucination risk in low-resource languages; 3) Fine-tuning on bilingual question pair translation further enhances LLMs’ recognition of knowledge boundaries across languages. Given the absence of standard testbeds for cross-lingual knowledge boundary analysis, we construct a multilingual evaluation suite comprising three representative types of knowledge boundary data. Our code and datasets are publicly available at https://github.com/DAMO-NLP-SG/LLM-Multilingual-Knowledge-Boundaries.

arxiv情報

著者	Chenghao Xiao,Hou Pong Chan,Hao Zhang,Mahani Aljunied,Lidong Bing,Noura Al Moubayed,Yu Rong
発行日	2025-04-18 17:44:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL | コメントを受け付けていません

Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models

投稿日: 2025年4月21日作成者: jarxiv

要約

知識蒸留（KD）は、複雑な教師モデルからよりシンプルな学生モデルに知識を移転するための手法であり、モデルの効率と精度を大幅に向上させます。
画像分類、オブジェクト検出、言語モデリング、テキスト分類、センチメント分析など、さまざまなアプリケーションで実質的な進歩を実証しています。
注意ベースのアプローチ、ブロックごとのロジット蒸留、デカップリングなどのKDメソッドの最近のイノベーションにより、学生モデルのパフォーマンスが改善されました。
これらの手法は、刺激の複雑さ、注意メカニズム、および知識移転を最適化するためのグローバル情報キャプチャに焦点を当てています。
さらに、KDは、精度を維持し、計算オーバーヘッドの削減、推論速度の向上を維持しながら、大きな言語モデルを圧縮するのに効果的であることが証明されています。
この調査では、最新の文献を統合し、知識の蒸留における重要な調査結果、貢献、および将来の方向性を強調して、人工知能と機械学習における進化する役割に関する研究者と実践者に洞察を提供します。

要約(オリジナル)

Knowledge distillation (KD) is a technique for transferring knowledge from complex teacher models to simpler student models, significantly enhancing model efficiency and accuracy. It has demonstrated substantial advancements in various applications including image classification, object detection, language modeling, text classification, and sentiment analysis. Recent innovations in KD methods, such as attention-based approaches, block-wise logit distillation, and decoupling distillation, have notably improved student model performance. These techniques focus on stimulus complexity, attention mechanisms, and global information capture to optimize knowledge transfer. In addition, KD has proven effective in compressing large language models while preserving accuracy, reducing computational overhead, and improving inference speed. This survey synthesizes the latest literature, highlighting key findings, contributions, and future directions in knowledge distillation to provide insights for researchers and practitioners on its evolving role in artificial intelligence and machine learning.

arxiv情報

著者	Junjie Yang,Junhao Song,Xudong Han,Ziqian Bi,Tianyang Wang,Chia Xin Liang,Xinyuan Song,Yichao Zhang,Qian Niu,Benji Peng,Keyu Chen,Ming Liu
発行日	2025-04-18 17:54:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL, cs.LG | コメントを受け付けていません

Science Hierarchography: Hierarchical Organization of Science Literature

投稿日: 2025年4月21日作成者: jarxiv

要約

科学的知識は急速に成長しており、幅広い分野全体で進歩と高レベルの概念的リンクを追跡することが困難になっています。
引用ネットワークや検索エンジンなどの既存のツールにより、いくつかの関連する論文に簡単にアクセスできますが、さまざまな科学サブフィールドの活動密度を表すために必要な柔軟な抽象化が根本的に欠けています。
私たちは、科学文献を、非常に広い分野から非常に特定の研究まで、さまざまなレベルの抽象化にわたって科学的研究の分類を可能にする高品質の階層構造に科学文献を組織するという目標を動機付けます。
このような表現は、どのフィールドが十分に標識されており、どのフィールドが未開拓であるかについての洞察を提供できます。
科学階層の目標を達成するために、さまざまなアルゴリズムを開発します。
私たちの主要なアプローチでは、高速埋め込みベースのクラスタリングとLLMベースのプロンプトを組み合わせて、埋め込み方法の計算効率とLLMプロンプトが提供するセマンティック精度のバランスをとっています。
このアプローチは、LLMを使用した反復ツリー構造など、LLMプロンプトに大きく依存している方法と比較して、品質と速度の間の最高のトレードオフを提供することを実証します。
研究論文の学際的かつ多面的な性質をよりよく反映するために、私たちの階層は、単純なトピックラベルを超えて複数の分類をキャプチャします。
LLMベースのエージェントが階層を使用してターゲットペーパーをどのように効果的に配置できるかを評価することにより、フレームワークの有用性を評価します。
結果は、この構造化されたアプローチが解釈可能性を高め、トレンドの発見をサポートし、従来の検索方法を超えて科学文献を探索するための代替経路を提供することを示しています。
コード、データ、およびデモ：$ \ href {https://github.com/jhu-clsp/science-hierarchography} {https://github.com/jhu-clsp/science-hierarchography} $

要約(オリジナル)

Scientific knowledge is growing rapidly, making it challenging to track progress and high-level conceptual links across broad disciplines. While existing tools like citation networks and search engines make it easy to access a few related papers, they fundamentally lack the flexible abstraction needed to represent the density of activity in various scientific subfields. We motivate SCIENCE HIERARCHOGRAPHY, the goal of organizing scientific literature into a high-quality hierarchical structure that allows for the categorization of scientific work across varying levels of abstraction, from very broad fields to very specific studies. Such a representation can provide insights into which fields are well-explored and which are under-explored. To achieve the goals of SCIENCE HIERARCHOGRAPHY, we develop a range of algorithms. Our primary approach combines fast embedding-based clustering with LLM-based prompting to balance the computational efficiency of embedding methods with the semantic precision offered by LLM prompting. We demonstrate that this approach offers the best trade-off between quality and speed compared to methods that heavily rely on LLM prompting, such as iterative tree construction with LLMs. To better reflect the interdisciplinary and multifaceted nature of research papers, our hierarchy captures multiple dimensions of categorization beyond simple topic labels. We evaluate the utility of our framework by assessing how effectively an LLM-based agent can locate target papers using the hierarchy. Results show that this structured approach enhances interpretability, supports trend discovery, and offers an alternative pathway for exploring scientific literature beyond traditional search methods. Code, data and demo: $\href{https://github.com/JHU-CLSP/science-hierarchography}{https://github.com/JHU-CLSP/science-hierarchography}$

arxiv情報

著者	Muhan Gao,Jash Shah,Weiqi Wang,Daniel Khashabi
発行日	2025-04-18 17:59:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL | コメントを受け付けていません

Divergent LLM Adoption and Heterogeneous Convergence Paths in Research Writing

投稿日: 2025年4月21日作成者: jarxiv

要約

ChatGptなどの大規模な言語モデル（LLM）は、コンテンツの作成とアカデミックライティングを再構築しています。
この研究では、不均一な採用パターンに焦点を当てた研究原稿に対するAI支援の生成的改訂の影響と、執筆への影響に焦点を当てています。
ARXIVの627,000を超えるアカデミックペーパーのデータセットを活用して、ChatGPT改革されたテキストのスタイルを検出するために、微調整プロンプトおよび規律固有の大手言語モデルによる新しい分類フレームワークを開発します。
私たちの調査結果は、学術的な執筆スタイルの急速な進化に加えて、学問分野、性別、母国語の状況、キャリアステージ全体のLLM採用におけるかなりの格差を明らかにしています。
さらに、LLMの使用は、修正タイプによって変化する改善とともに、正式な執筆慣習の明確さ、簡潔さ、順守を強化します。
最後に、異なる違いの分析では、LLMSがアカデミックライティングの収束を促進する一方で、早期採用者、男性の研究者、非ネイティブスピーカー、およびジュニア学者が最も顕著な文体的な変化を示し、確立された研究者のそれとより密接に執筆することを示しています。

要約(オリジナル)

Large Language Models (LLMs), such as ChatGPT, are reshaping content creation and academic writing. This study investigates the impact of AI-assisted generative revisions on research manuscripts, focusing on heterogeneous adoption patterns and their influence on writing convergence. Leveraging a dataset of over 627,000 academic papers from arXiv, we develop a novel classification framework by fine-tuning prompt- and discipline-specific large language models to detect the style of ChatGPT-revised texts. Our findings reveal substantial disparities in LLM adoption across academic disciplines, gender, native language status, and career stage, alongside a rapid evolution in scholarly writing styles. Moreover, LLM usage enhances clarity, conciseness, and adherence to formal writing conventions, with improvements varying by revision type. Finally, a difference-in-differences analysis shows that while LLMs drive convergence in academic writing, early adopters, male researchers, non-native speakers, and junior scholars exhibit the most pronounced stylistic shifts, aligning their writing more closely with that of established researchers.

arxiv情報

著者	Cong William Lin,Wu Zhu
発行日	2025-04-18 11:09:16+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, econ.GN, q-fin.EC | コメントを受け付けていません

Multi-modal Knowledge Graph Generation with Semantics-enriched Prompts

投稿日: 2025年4月21日作成者: jarxiv

要約

マルチモーダルナレッジグラフ（MMKG）は、知識表現のためにさまざまなドメインに広く適用されています。
ただし、既存のMMKGは必要よりも大幅に少なく、特に知識グラフ濃縮のための高品質でコンテキスト関連の画像の選択を確保するために、それらの構造は多くの課題に直面しています。
これらの課題に対処するために、従来のKGSからMMKGを構築するためのフレームワークを提示します。
さらに、指定された知識グラフのコンテキストにより関連する高品質の画像を生成するために、視覚化可能な構造隣接選択（VSNS）と呼ばれる隣接選択方法を設計しました。
この方法は、視覚化可能な隣接選択（VNS）と構造隣接選択（SNS）の2つのモジュールで構成されています。
VNSモジュールは視覚化が困難な関係をフィルタリングしますが、SNSモジュールはエンティティの構造特性を最も効果的にキャプチャする近隣を選択します。
生成された画像の品質を評価するために、2つのデータセット（MKG-YとDB15K）で定性的および定量的評価を実行しました。
実験結果は、VSNSメソッドを使用して近隣を選択すると、知識グラフにより関連する高品質の画像が得られることを示しています。

要約(オリジナル)

Multi-modal Knowledge Graphs (MMKGs) have been widely applied across various domains for knowledge representation. However, the existing MMKGs are significantly fewer than required, and their construction faces numerous challenges, particularly in ensuring the selection of high-quality, contextually relevant images for knowledge graph enrichment. To address these challenges, we present a framework for constructing MMKGs from conventional KGs. Furthermore, to generate higher-quality images that are more relevant to the context in the given knowledge graph, we designed a neighbor selection method called Visualizable Structural Neighbor Selection (VSNS). This method consists of two modules: Visualizable Neighbor Selection (VNS) and Structural Neighbor Selection (SNS). The VNS module filters relations that are difficult to visualize, while the SNS module selects neighbors that most effectively capture the structural characteristics of the entity. To evaluate the quality of the generated images, we performed qualitative and quantitative evaluations on two datasets, MKG-Y and DB15K. The experimental results indicate that using the VSNS method to select neighbors results in higher-quality images that are more relevant to the knowledge graph.

arxiv情報

著者	Yajing Xu,Zhiqiang Liu,Jiaoyan Chen,Mingchen Tu,Zhuo Chen,Jeff Z. Pan,Yichi Zhang,Yushan Zhu,Wen Zhang,Huajun Chen
発行日	2025-04-18 11:12:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI | コメントを受け付けていません

Argumentative Large Language Models for Explainable and Contestable Claim Verification

投稿日: 2025年4月21日作成者: jarxiv

要約

大規模な言語モデル（LLM）でエンコードされた知識の豊富さと、この知識をさまざまな設定で適用する能力により、意思決定に使用する候補者が有望になります。
ただし、現在、間違いを修正するために忠実に説明し、効果的に争われる可能性のある出力を提供できないことによって制限されています。
この論文では、論争の推論でLLMを強化する方法である\ emph {armocative llms（argllms）}を導入することにより、これらの長所と短所を調整しようとします。
具体的には、Argllmsは議論のフレームワークを構築します。これは、意思決定をサポートする正式な推論の基礎として機能します。
これらの議論のフレームワークの解釈可能な性質と正式な推論は、Argllmsによって下された決定が説明され、争われる可能性があることを意味します。
請求検証の意思決定タスクの文脈で、最先端のテクニックと比較して、Argllmsのパフォーマンスを実験的に評価します。
また、これらの特性の観点から、争い性を特徴付け、Argllmsを正式に評価するための新しい特性を定義します。

要約(オリジナル)

The profusion of knowledge encoded in large language models (LLMs) and their ability to apply this knowledge zero-shot in a range of settings makes them promising candidates for use in decision-making. However, they are currently limited by their inability to provide outputs which can be faithfully explained and effectively contested to correct mistakes. In this paper, we attempt to reconcile these strengths and weaknesses by introducing \emph{argumentative LLMs (ArgLLMs)}, a method for augmenting LLMs with argumentative reasoning. Concretely, ArgLLMs construct argumentation frameworks, which then serve as the basis for formal reasoning in support of decision-making. The interpretable nature of these argumentation frameworks and formal reasoning means that any decision made by ArgLLMs may be explained and contested. We evaluate ArgLLMs’ performance experimentally in comparison with state-of-the-art techniques, in the context of the decision-making task of claim verification. We also define novel properties to characterise contestability and assess ArgLLMs formally in terms of these properties.

arxiv情報

著者	Gabriel Freedman,Adam Dejl,Deniz Gorur,Xiang Yin,Antonio Rago,Francesca Toni
発行日	2025-04-18 11:20:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, I.2.7 | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント