jarxiv | Japanese arxiv | ページ 1432

Dialogue Ontology Relation Extraction via Constrained Chain-of-Thought Decoding

投稿日: 2025年3月10日作成者: jarxiv

要約

通常、最先端のタスク指向のダイアログシステムは、ユーザークエリを満たすためにタスク固有のオントロジーに依存しています。
カスタマーサービスの記録など、タスク指向の対話データの大部分は、オントロジーや注釈なしで行われます。
このようなオントロジーは通常、手動で構築されており、特殊なシステムの適用が制限されます。
ダイアログオントロジー構造は、そのプロセスを自動化するためのアプローチであり、通常、用語抽出と関係抽出の2つのステップで構成されています。
この作業では、転送学習セットアップでの関係抽出に焦点を当てています。
一般化を改善するために、大規模な言語モデルのデコードメカニズムの拡張を提案します。
推論の問題のために最近開発されたチェーンオブ考え（COT）デコードを生成関係抽出に適応させます。
ここでは、デコード空間に複数の分岐を生成し、信頼のしきい値に基づいて関係を選択します。
オントロジーの用語と関係へのデコードを制約することにより、幻覚のリスクを減らすことを目指しています。
広く使用されている2つのデータセットで広範な実験を実施し、ソース微調整およびワンショットの大規模な言語モデルのターゲットオントロジーのパフォーマンスの改善を見つけます。

要約(オリジナル)

State-of-the-art task-oriented dialogue systems typically rely on task-specific ontologies for fulfilling user queries. The majority of task-oriented dialogue data, such as customer service recordings, comes without ontology and annotation. Such ontologies are normally built manually, limiting the application of specialised systems. Dialogue ontology construction is an approach for automating that process and typically consists of two steps: term extraction and relation extraction. In this work, we focus on relation extraction in a transfer learning set-up. To improve the generalisation, we propose an extension to the decoding mechanism of large language models. We adapt Chain-of-Thought (CoT) decoding, recently developed for reasoning problems, to generative relation extraction. Here, we generate multiple branches in the decoding space and select the relations based on a confidence threshold. By constraining the decoding to ontology terms and relations, we aim to decrease the risk of hallucination. We conduct extensive experimentation on two widely used datasets and find improvements in performance on target ontology for source fine-tuned and one-shot prompted large language models.

arxiv情報

著者	Renato Vukovic,David Arps,Carel van Niekerk,Benjamin Matthias Ruppik,Hsien-Chin Lin,Michael Heck,Milica Gašić
発行日	2025-03-07 11:12:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

Dynamic Knowledge Integration for Evidence-Driven Counter-Argument Generation with Large Language Models

投稿日: 2025年3月10日作成者: jarxiv

要約

このペーパーでは、大規模な言語モデル（LLM）を使用した反論生成の改善における動的な外部知識統合の役割を調査します。
LLMは論争的なタスクで有望であることを示していますが、長く、潜在的に不正な反応を生成する傾向は、より制御された証拠に基づいたアプローチの必要性を強調しています。
引数の複雑さと評価の実現可能性のバランスをとるように特別に設計された、引数と反論ペアの新しい手動でキュレーションされたデータセットを紹介します。
また、従来の参照ベースのメトリックと比較して、人間の判断とのより強い相関を示す新しいLLM-as-a-a-Judge評価方法論も提案します。
私たちの実験結果は、Webからの動的な外部知識を統合することで、特に関連性、説得力、事実の観点から、生成された反論の品質が大幅に向上することを示しています。
調査結果は、LLMとリアルタイムの外部知識検索を組み合わせることで、より効果的で信頼性の高い反論システムを開発するための有望な方向性を提供することを示唆しています。

要約(オリジナル)

This paper investigates the role of dynamic external knowledge integration in improving counter-argument generation using Large Language Models (LLMs). While LLMs have shown promise in argumentative tasks, their tendency to generate lengthy, potentially unfactual responses highlights the need for more controlled and evidence-based approaches. We introduce a new manually curated dataset of argument and counter-argument pairs specifically designed to balance argumentative complexity with evaluative feasibility. We also propose a new LLM-as-a-Judge evaluation methodology that shows a stronger correlation with human judgments compared to traditional reference-based metrics. Our experimental results demonstrate that integrating dynamic external knowledge from the web significantly improves the quality of generated counter-arguments, particularly in terms of relatedness, persuasiveness, and factuality. The findings suggest that combining LLMs with real-time external knowledge retrieval offers a promising direction for developing more effective and reliable counter-argumentation systems.

arxiv情報

著者	Anar Yeginbergen,Maite Oronoz,Rodrigo Agerri
発行日	2025-03-07 11:13:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

Speculative Decoding for Multi-Sample Inference

投稿日: 2025年3月10日作成者: jarxiv

要約

私たちは、自己整合やベスト・ア・サンプリングなどのマルチサンプルの推論シナリオに合わせた新しい投機的デコード方法を提案します。
私たちの方法は、並列生成パスの本質的なコンセンサスを活用して、補助モデルや外部データベースを必要とせずに高品質のドラフトトークンを合成します。
確率的集約メカニズムを介した平行な推論パス全体で構造パターンを動的に分析することにより、デコード分布に合わせたコンセンサストークンシーケンスを特定します。
数学的推論ベンチマークに関する評価は、ドラフトトークン構造のレイテンシを減らしながら、ベースライン上のドラフト受け入れ率の大幅な改善を示しています。
この作業は、効率的なマルチサンプルの推論のためのパラダイムシフトを確立し、サンプリングベースの推論技術と投機的デコードのシームレスな統合を可能にします。

要約(オリジナル)

We propose a novel speculative decoding method tailored for multi-sample reasoning scenarios, such as self-consistency and Best-of-N sampling. Our method exploits the intrinsic consensus of parallel generation paths to synthesize high-quality draft tokens without requiring auxiliary models or external databases. By dynamically analyzing structural patterns across parallel reasoning paths through a probabilistic aggregation mechanism, it identifies consensus token sequences that align with the decoding distribution. Evaluations on mathematical reasoning benchmarks demonstrate a substantial improvement in draft acceptance rates over baselines, while reducing the latency in draft token construction. This work establishes a paradigm shift for efficient multi-sample inference, enabling seamless integration of speculative decoding with sampling-based reasoning techniques.

arxiv情報

著者	Yiwei Li,Jiayi Shi,Shaoxiong Feng,Peiwen Yuan,Xinglin Wang,Yueqi Zhang,Ji Zhang,Chuyi Tan,Boyuan Pan,Yao Hu,Kan Li
発行日	2025-03-07 11:15:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

EdgeMoE: Empowering Sparse Large Language Models on Mobile Devices

投稿日: 2025年3月10日作成者: jarxiv

要約

GPTSやMixTral-8X7Bなどの大規模な言語モデル（LLM）は、ジェネリックMLタスクの例外的な能力により、機械の知能に革命をもたらしました。
データセンターからエッジデバイスへのLLMを通過すると、プライバシーや可用性の向上などのメリットがもたらされますが、大規模なパラメーターサイズ、したがって耐え難いランタイムコストに挑戦しています。
この目的のために、Edgemoeは、ExgemoeであるEdgemoe、Expert-of-Expert（MOE）LLMの混合用デバイス推論エンジンです。これは、ほぼ一定のコンピューティングの複雑さでパラメーターサイズをスケーリングするスパースLLMの一般的な形式です。
Edgemoeは、モデルをストレージ階層に分割することにより、メモリと計算効率の両方を達成します。非専門家の重みはデバイスメモリに保持されます。
一方、エキスパートウェイトは外部ストレージに保持され、アクティブ化されたときにのみメモリにフェッチされます。
この設計は、専門家の重みはかさばりがあるが、まばらな活性化のためにまれに使用されるという重要な観察によって動機付けられています。
専門家のI/Oをさらに削減するために、Edgemoeには2つの新しいテクニックが組み込まれています。（1）専門家のサイズを容認できる精度損失で縮小する専門家ごとのbit幅適応。
（2）アクティブ化された専門家を事前に予測し、Compute-I/Oパイプラインでプリロードする専門家のプリロード。
人気のあるMoe LLMSおよびEdgeデバイスでは、Edgemoeは競争力のあるベースラインよりも大幅なメモリの節約とスピードアップを紹介します。
このコードは、https：//github.com/ubiquitouslearning/mllmで入手できます。

要約(オリジナル)

Large language models (LLMs) such as GPTs and Mixtral-8x7B have revolutionized machine intelligence due to their exceptional abilities in generic ML tasks. Transiting LLMs from datacenters to edge devices brings benefits like better privacy and availability, but is challenged by their massive parameter size and thus unbearable runtime costs. To this end, we present EdgeMoE, an on-device inference engine for mixture-of-expert (MoE) LLMs — a popular form of sparse LLM that scales its parameter size with almost constant computing complexity. EdgeMoE achieves both memory- and compute-efficiency by partitioning the model into the storage hierarchy: non-expert weights are held in device memory; while expert weights are held on external storage and fetched to memory only when activated. This design is motivated by a key observation that expert weights are bulky but infrequently used due to sparse activation. To further reduce the expert I/O swapping overhead, EdgeMoE incorporates two novel techniques: (1) expert-wise bitwidth adaptation that reduces the expert sizes with tolerable accuracy loss; (2) expert preloading that predicts the activated experts ahead of time and preloads it with the compute-I/O pipeline. On popular MoE LLMs and edge devices, EdgeMoE showcase significant memory savings and speedup over competitive baselines. The code is available at https://github.com/UbiquitousLearning/mllm.

arxiv情報

著者	Rongjie Yi,Liwei Guo,Shiyun Wei,Ao Zhou,Shangguang Wang,Mengwei Xu
発行日	2025-03-07 11:16:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

A Confidence-based Acquisition Model for Self-supervised Active Learning and Label Correction

投稿日: 2025年3月10日作成者: jarxiv

要約

監視された神経アプローチは、大規模で細心の注意を払って注釈付きのデータセットに依存することによって妨げられます。これは、連続したタスクに特に面倒な要件です。
注釈の品質は、専門家ベースからクラウドソースのラベル付けへの移行とともに、悪化する傾向があります。
これらの課題に対処するために、順次マルチアウトプットの問題に合わせて調整されたプールベースのアクティブ学習フレームワークであるCamel（効率的な自己監視アクティブ学習のための信頼ベースの取得モデル）を提示します。
Camelは2つのコア機能を備えています。（1）エキスパートアノテーターが選択されたシーケンスのほんの一部のみにラベルを付ける必要があり、（2）残りのシーケンスのセルフスーパービジョンを促進します。
ラベル補正メカニズムを展開することにより、ラクダはデータクリーニングにも利用できます。
対話の信念追跡に特に重点を置いて、2つの連続したタスクでラクダを評価します。これは、限られた騒々しいデータセットの制約に悩まされているタスクです。
私たちの実験は、ラクダが効率性の点でベースラインを大幅に上回ることを示しています。
さらに、私たちの方法によって提案されたデータ修正は、結果のデータセットの品質の全体的な改善に貢献します。

要約(オリジナル)

Supervised neural approaches are hindered by their dependence on large, meticulously annotated datasets, a requirement that is particularly cumbersome for sequential tasks. The quality of annotations tends to deteriorate with the transition from expert-based to crowd-sourced labelling. To address these challenges, we present CAMEL (Confidence-based Acquisition Model for Efficient self-supervised active Learning), a pool-based active learning framework tailored to sequential multi-output problems. CAMEL possesses two core features: (1) it requires expert annotators to label only a fraction of a chosen sequence, and (2) it facilitates self-supervision for the remainder of the sequence. By deploying a label correction mechanism, CAMEL can also be utilised for data cleaning. We evaluate CAMEL on two sequential tasks, with a special emphasis on dialogue belief tracking, a task plagued by the constraints of limited and noisy datasets. Our experiments demonstrate that CAMEL significantly outperforms the baselines in terms of efficiency. Furthermore, the data corrections suggested by our method contribute to an overall improvement in the quality of the resulting datasets.

arxiv情報

著者	Carel van Niekerk,Christian Geishauser,Michael Heck,Shutong Feng,Hsien-chin Lin,Nurul Lubis,Benjamin Ruppik,Renato Vukovic,Milica Gašić
発行日	2025-03-07 11:23:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL, cs.LG | コメントを受け付けていません

AutoIOT: LLM-Driven Automated Natural Language Programming for AIoT Applications

投稿日: 2025年3月10日作成者: jarxiv

要約

大規模な言語モデル（LLMS）の出現は、私たちの生活を大きく変え、AIとの相互作用に革命をもたらし、AIの使用に対する障壁を下げました。
LLMは主に自然言語の相互作用のために設計されていますが、広範な埋め込み知識により、デジタルセンサーデータを理解することができます。
この機能により、LLMはIoTセンサーとアクチュエーターを介して物理的な世界に関与し、無数のAIOTタスクを実行できます。
その結果、この進化は、従来のAIOTアプリケーション開発のパラダイムシフトを引き起こし、自然言語を介したAIOTアプリケーションの設計と開発を促進することにより、すべての人へのアクセシビリティを民主化します。
ただし、AIOTアプリケーション開発におけるLLMSの完全な潜在能力のロックを解除するには、いくつかの制限に対処する必要があります。
第一に、既存のソリューションでは、多くの場合、生のセンサーデータをLLMサーバーに転送する必要があります。これは、プライバシーの懸念を引き起こし、高いクエリ料金が発生し、トークンサイズによって制限されます。
さらに、LLMの推論プロセスはユーザーにとって不透明であり、推論結果の堅牢性と正確性を検証することを困難にしています。
このペーパーでは、AIOTアプリケーション向けのLLMベースの自動プログラムジェネレーターであるAutoiotを紹介します。
Autoiotを使用すると、ユーザーは自然言語（入力）を使用して要件を指定でき、ドキュメント（出力）を使用して解釈可能なプログラムを自動的に合成します。
Autoiotは反復的な最適化を自動化して、ユーザーの関与を最小限に抑えて生成されたコードの品質を向上させます。
Autoiotは、AIOTタスクの実行をより説明しやすくするだけでなく、プライバシーの懸念を軽減し、合成プログラムのローカル実行によりトークンコストを削減します。
広範な実験とユーザー研究は、さまざまなAIOTタスクのプログラム統合におけるAutoiotの顕著な能力を示しています。
合成されたプログラムは、いくつかの代表的なベースラインと一致し、さらには上回ることができます。

要約(オリジナル)

The advent of Large Language Models (LLMs) has profoundly transformed our lives, revolutionizing interactions with AI and lowering the barrier to AI usage. While LLMs are primarily designed for natural language interaction, the extensive embedded knowledge empowers them to comprehend digital sensor data. This capability enables LLMs to engage with the physical world through IoT sensors and actuators, performing a myriad of AIoT tasks. Consequently, this evolution triggers a paradigm shift in conventional AIoT application development, democratizing its accessibility to all by facilitating the design and development of AIoT applications via natural language. However, some limitations need to be addressed to unlock the full potential of LLMs in AIoT application development. First, existing solutions often require transferring raw sensor data to LLM servers, which raises privacy concerns, incurs high query fees, and is limited by token size. Moreover, the reasoning processes of LLMs are opaque to users, making it difficult to verify the robustness and correctness of inference results. This paper introduces AutoIOT, an LLM-based automated program generator for AIoT applications. AutoIOT enables users to specify their requirements using natural language (input) and automatically synthesizes interpretable programs with documentation (output). AutoIOT automates the iterative optimization to enhance the quality of generated code with minimum user involvement. AutoIOT not only makes the execution of AIoT tasks more explainable but also mitigates privacy concerns and reduces token costs with local execution of synthesized programs. Extensive experiments and user studies demonstrate AutoIOT’s remarkable capability in program synthesis for various AIoT tasks. The synthesized programs can match and even outperform some representative baselines.

arxiv情報

著者	Leming Shen,Qiang Yang,Yuanqing Zheng,Mo Li
発行日	2025-03-07 11:40:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.SE | コメントを受け付けていません

GEMA-Score: Granular Explainable Multi-Agent Score for Radiology Report Evaluation

投稿日: 2025年3月10日作成者: jarxiv

要約

自動医療レポートの生成は、臨床診断をサポートし、放射線科医のワークロードを減らし、診断の一貫性を改善するという約束を保持します。
ただし、既存の評価メトリックは、報告された異常の位置や確実性などの重要な詳細を見落としながら、人間が作成したレポートと比較して、生成されたレポートの主要な医療情報カバレッジの精度を主に評価します。
これらの制限は、生成されたレポートの信頼性の包括的な評価を妨げ、臨床使用の選択にリスクをもたらします。
したがって、このホワイトペーパーでは、客観的な定量化と主観的評価の両方を実施する粒状の説明可能なマルチエージェントスコア（GEMAスコア）を提案します。
GEMAスコアは、疾患の診断、位置、重症度、不確実性を評価するために、エージェント間の情報交換を通じてインタラクティブな情報交換を通じて、構造化されたレポートを分析し、NER-F1計算を採用しています。
さらに、LLMベースのスコアリングエージェントは、説明的なフィードバックを提供しながら、完全性、読みやすさ、および臨床用語を評価します。
広範な実験では、GEMAスコアがパブリックデータセットの人間の専門家評価と最高の相関を達成し、臨床スコアリングにおけるその有効性を実証することを検証します（Rexval DatasetおよびKendall係数= Radevalx Datasetの場合は0.54）。
匿名のプロジェクトデモは、https：//github.com/zhenxuan-zhang/gema_scoreで入手できます。

要約(オリジナル)

Automatic medical report generation supports clinical diagnosis, reduces the workload of radiologists, and holds the promise of improving diagnosis consistency. However, existing evaluation metrics primarily assess the accuracy of key medical information coverage in generated reports compared to human-written reports, while overlooking crucial details such as the location and certainty of reported abnormalities. These limitations hinder the comprehensive assessment of the reliability of generated reports and pose risks in their selection for clinical use. Therefore, we propose a Granular Explainable Multi-Agent Score (GEMA-Score) in this paper, which conducts both objective quantification and subjective evaluation through a large language model-based multi-agent workflow. Our GEMA-Score parses structured reports and employs NER-F1 calculations through interactive exchanges of information among agents to assess disease diagnosis, location, severity, and uncertainty. Additionally, an LLM-based scoring agent evaluates completeness, readability, and clinical terminology while providing explanatory feedback. Extensive experiments validate that GEMA-Score achieves the highest correlation with human expert evaluations on a public dataset, demonstrating its effectiveness in clinical scoring (Kendall coefficient = 0.70 for Rexval dataset and Kendall coefficient = 0.54 for RadEvalX dataset). The anonymous project demo is available at: https://github.com/Zhenxuan-Zhang/GEMA_score.

arxiv情報

著者	Zhenxuan Zhang,Kinhei Lee,Weihang Deng,Huichi Zhou,Zihao Jin,Jiahao Huang,Zhifan Gao,Dominic C Marshall,Yingying Fang,Guang Yang
発行日	2025-03-07 11:42:22+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL, cs.MA | コメントを受け付けていません

Familiarity: Better Evaluation of Zero-Shot Named Entity Recognition by Quantifying Label Shifts in Synthetic Training Data

投稿日: 2025年3月10日作成者: jarxiv

要約

Zero-Shotという名前のエンティティ認識（NER）は、トレーニングの例なしに特定のタイプ（「人」や「薬」など）の名前付きエンティティを検出するタスクです。
現在の研究は、ゼロショットNERモデルをトレーニングするために、数万の異なるエンティティタイプをカバーするために自動的に生成される大きな合成データセットにますます依存しています。
ただし、このホワイトペーパーでは、これらの合成データセットには、標準の評価ベンチマークのものと意味的に非常に類似している（または同じ）エンティティタイプが含まれていることがよくあります。
この重複のために、ゼロショットNERのF1スコアがこれらのアプローチの真の能力を過大評価していると報告したと主張します。
さらに、現在の評価セットアップは、トレーニングと評価データセットの間のラベルシフト（つまり、ラベルの類似性）を定量化しないため、ゼロショット能力の不完全な画像を提供すると主張します。
これらの問題に対処するために、私たちは、ラベルシフトの推定値を提供するために、トレーニングと評価におけるエンティティタイプとトレーニングデータの頻度の間のセマンティックな類似性の両方をキャプチャする新しいメトリックである親しみやすさを提案します。
これにより、研究者は、カスタム合成トレーニングデータセットを使用する際に、報告されたゼロショットNERスコアをコンテキスト化することができます。
さらに、研究者は、ゼロショットNERのきめの細かい分析のために、さまざまな転送困難の評価セットアップを生成することができます。

要約(オリジナル)

Zero-shot named entity recognition (NER) is the task of detecting named entities of specific types (such as ‘Person’ or ‘Medicine’) without any training examples. Current research increasingly relies on large synthetic datasets, automatically generated to cover tens of thousands of distinct entity types, to train zero-shot NER models. However, in this paper, we find that these synthetic datasets often contain entity types that are semantically highly similar to (or even the same as) those in standard evaluation benchmarks. Because of this overlap, we argue that reported F1 scores for zero-shot NER overestimate the true capabilities of these approaches. Further, we argue that current evaluation setups provide an incomplete picture of zero-shot abilities since they do not quantify the label shift (i.e., the similarity of labels) between training and evaluation datasets. To address these issues, we propose Familiarity, a novel metric that captures both the semantic similarity between entity types in training and evaluation, as well as their frequency in the training data, to provide an estimate of label shift. It allows researchers to contextualize reported zero-shot NER scores when using custom synthetic training datasets. Further, it enables researchers to generate evaluation setups of various transfer difficulties for fine-grained analysis of zero-shot NER.

arxiv情報

著者	Jonas Golde,Patrick Haller,Max Ploner,Fabio Barth,Nicolaas Jedema,Alan Akbik
発行日	2025-03-07 11:54:22+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL | コメントを受け付けていません

Improving Hate Speech Classification with Cross-Taxonomy Dataset Integration

投稿日: 2025年3月10日作成者: jarxiv

要約

アルゴリズムヘイトスピーチ検出は、研究と実践で使用される多様な定義とデータセットのために、重大な課題に直面しています。
ソーシャルメディアプラットフォーム、法的枠組み、および機関はそれぞれ、明確でありながら重複する定義を適用し、分類の取り組みを複雑にします。
この研究では、既存のデータセットと分類法を統一モデルに統合し、予測のパフォーマンスを向上させ、複数の専門分類器への依存を減らすことができることを実証することにより、これらの課題に対処します。
この作品は、単一のフレームワーク内で幅広い定義を検出できる普遍的な分類法とヘイトスピーチ分類器を導入します。
私たちのアプローチは、広く使用されているが異なる注釈付きデータセットを組み合わせることで検証され、独立したテストセットでの分類パフォーマンスが改善されました。
この作業は、ヘイトスピーチの検出を進め、効率を高め、コンテキスト全体でより幅広い適用性を確保することにおけるデータセットと分類統合の可能性を強調しています。

要約(オリジナル)

Algorithmic hate speech detection faces significant challenges due to the diverse definitions and datasets used in research and practice. Social media platforms, legal frameworks, and institutions each apply distinct yet overlapping definitions, complicating classification efforts. This study addresses these challenges by demonstrating that existing datasets and taxonomies can be integrated into a unified model, enhancing prediction performance and reducing reliance on multiple specialized classifiers. The work introduces a universal taxonomy and a hate speech classifier capable of detecting a wide range of definitions within a single framework. Our approach is validated by combining two widely used but differently annotated datasets, showing improved classification performance on an independent test set. This work highlights the potential of dataset and taxonomy integration in advancing hate speech detection, increasing efficiency, and ensuring broader applicability across contexts.

arxiv情報

著者	Jan Fillies,Adrian Paschke
発行日	2025-03-07 12:01:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG, cs.SI | コメントを受け付けていません

Chain of Strategy Optimization Makes Large Language Models Better Emotional Supporter

投稿日: 2025年3月10日作成者: jarxiv

要約

現代社会における感情的なストレスの高まりは、感情的なサポートの会話（ESC）の需要を高めました。
大規模な言語モデル（LLMS）はESCの有望を示していますが、2つの重要な課題に直面しています。（1）戦略選択の精度と（2）優先バイアスは、ユーザーの感情的なニーズに適応性を制限します。
既存の監視された微調整（SFT）は、微妙な戦略トレードオフをモデル化することなく、単一の金標準応答のモデルを厳密に訓練するため、これらの問題に対処するのに苦労しています。
これらの制限を克服するために、各ダイアログターンで戦略選択の好みを最適化する新しいアプローチである、戦略の最適化（CSO）を提案します。
最初にモンテカルロツリー検索を活用して、ターンレベルの戦略応答ペアを備えた高品質の優先データセットであるESC-Proを構築します。
CSOを使用したESC-Proのトレーニングにより、戦略の精度とバイアス緩和の両方が向上し、LLMがより共感的で文脈的に適切な応答を生成できるようになります。
llama-3.1-8b、gemma-2-9b、およびqwen2.5-7bの実験は、CSOが標準SFTを上回ることを示しており、ESCにおける細粒のターンレベルの好みモデリングの有効性を強調しています。

要約(オリジナル)

The growing emotional stress in modern society has increased the demand for Emotional Support Conversations (ESC). While Large Language Models (LLMs) show promise for ESC, they face two key challenges: (1) low strategy selection accuracy, and (2) preference bias, limiting their adaptability to emotional needs of users. Existing supervised fine-tuning (SFT) struggles to address these issues, as it rigidly trains models on single gold-standard responses without modeling nuanced strategy trade-offs. To overcome these limitations, we propose Chain-of-Strategy Optimization (CSO), a novel approach that optimizes strategy selection preferences at each dialogue turn. We first leverage Monte Carlo Tree Search to construct ESC-Pro, a high-quality preference dataset with turn-level strategy-response pairs. Training on ESC-Pro with CSO improves both strategy accuracy and bias mitigation, enabling LLMs to generate more empathetic and contextually appropriate responses. Experiments on LLaMA-3.1-8B, Gemma-2-9B, and Qwen2.5-7B demonstrate that CSO outperforms standard SFT, highlighting the efficacy of fine-grained, turn-level preference modeling in ESC.

arxiv情報

著者	Weixiang Zhao,Xingyu Sui,Xinyang Han,Yang Deng,Yulin Hu,Jiahe Guo,Libo Qin,Qianyun Du,Shijin Wang,Yanyan Zhao,Bing Qin,Ting Liu
発行日	2025-03-07 12:07:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント