jarxiv | Japanese arxiv | ページ 1231

On-Device LLMs for Home Assistant: Dual Role in Intent Detection and Response Generation

投稿日: 2025年3月24日作成者: jarxiv

要約

このペーパーでは、合成がドメイン代表的なデータで微調整された大規模な言語モデル（LLM）が、（i）スロットと意図検出の2つのタスクを実行できるかどうか、および（ii）スマートホームアシスタントの自然言語応答生成を実行できるかどうかを調査します。
LLMSを微調整して、JSONアクションコールとテキスト応答の両方を作成します。
私たちの実験は、16ビットと8ビットの量子化されたバリアントがスロットと意図の検出の高精度を維持し、生成されたテキストの強力なセマンティックコヒーレンスを維持し、4ビットモデルは生成の流encyさを保持しながら、デバイスサービス分類の精度の顕著な低下に苦しむことが示されています。
ノイズの多い人間（非合成）プロンプトとドメイン外の意図に関するさらなる評価は、モデルの一般化能力を確認し、約80〜86 \％の精度を取得します。
平均推論時間はクエリごとに5〜6秒ですが、ワンショットコマンドでは許容されますが、マルチターンダイアログでは準最適ですが、我々の結果は、特殊なハードウェアに依存することなく、コマンドの解釈と柔軟な対応生成をホームオートメーションのコマンド解釈と柔軟な応答生成を効果的に統合できることを確認しています。

要約(オリジナル)

This paper investigates whether Large Language Models (LLMs), fine-tuned on synthetic but domain-representative data, can perform the twofold task of (i) slot and intent detection and (ii) natural language response generation for a smart home assistant, while running solely on resource-limited, CPU-only edge hardware. We fine-tune LLMs to produce both JSON action calls and text responses. Our experiments show that 16-bit and 8-bit quantized variants preserve high accuracy on slot and intent detection and maintain strong semantic coherence in generated text, while the 4-bit model, while retaining generative fluency, suffers a noticeable drop in device-service classification accuracy. Further evaluations on noisy human (non-synthetic) prompts and out-of-domain intents confirm the models’ generalization ability, obtaining around 80–86\% accuracy. While the average inference time is 5–6 seconds per query — acceptable for one-shot commands but suboptimal for multi-turn dialogue — our results affirm that an on-device LLM can effectively unify command interpretation and flexible response generation for home automation without relying on specialized hardware.

arxiv情報

著者	Rune Birkmose,Nathan Mørkeberg Reece,Esben Hofstedt Norvin,Johannes Bjerva,Mike Zhang
発行日	2025-03-21 08:23:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL | コメントを受け付けていません

When Words Outperform Vision: VLMs Can Self-Improve Via Text-Only Training For Human-Centered Decision Making

投稿日: 2025年3月24日作成者: jarxiv

要約

具体化された意思決定は、実際の環境で動作するAIエージェントの基本です。
視覚言語モデル（VLM）はこの能力を進めていますが、特に人間のニーズと価値についての深い推論を必要とする人間中心の状況では、複雑な決定に苦労しています。
この研究では、マルチモーダルのヒト中心の意思決定タスクでオープンソースのVLMを体系的に評価します。
テキストの説明のみを受信するLLMは、実際の画像を処理する同様のスケールのVLMカウンターパートを予想外に上回ることがわかり、視覚的アライメントがVLM能力を妨げる可能性があることを示唆しています。
この課題に対処するために、合成されたテキストデータを使用した新しいテキストのみのトレーニングアプローチを提案します。
この方法は、VLMSの言語コンポーネントを強化し、学習能力をマルチモーダル推論に転送し、高価な画像テキストペアのデータの必要性を排除します。
さらに、GPT-4などの大規模な教師モデルに依存するのではなく、LLMのカウンターパートによって生成されたトレーニングデータを使用して、VLMが自己改善を通じてかなりのパフォーマンスの向上を達成できることを示しています。
私たちの調査結果は、VLMSの人間中心の意思決定能力を強化するためのより効率的でスケーラブルなアプローチを確立し、自己改善メカニズムを通じてVLMを最適化するための新しい道を開きます。

要約(オリジナル)

Embodied decision-making is fundamental for AI agents operating in real-world environments. While Visual Language Models (VLMs) have advanced this capability, they still struggle with complex decisions, particularly in human-centered situations that require deep reasoning about human needs and values. In this study, we systematically evaluate open-sourced VLMs on multimodal human-centered decision-making tasks. We find that LLMs receiving only textual descriptions unexpectedly outperform their VLM counterparts of similar scale that process actual images, suggesting that visual alignment may hinder VLM abilities. To address this challenge, we propose a novel text-only training approach with synthesized textual data. This method strengthens VLMs’ language components and transfers the learned abilities to multimodal inference, eliminating the need for expensive image-text paired data. Furthermore, we show that VLMs can achieve substantial performance gains through self-improvement, using training data generated by their LLM counterparts rather than relying on larger teacher models like GPT-4. Our findings establish a more efficient and scalable approach to enhancing VLMs’ human-centered decision-making capabilities, opening new avenues for optimizing VLMs through self-improvement mechanisms.

arxiv情報

著者	Zhe Hu,Jing Li,Yu Yin
発行日	2025-03-21 09:25:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Human-in-the-Loop Generation of Adversarial Texts: A Case Study on Tibetan Script

投稿日: 2025年3月24日作成者: jarxiv

要約

DNNベースの言語モデルは、さまざまなタスクで優れたパフォーマンスを発揮しますが、SOTA LLMでさえテキストの敵対的な攻撃を受けやすいです。
敵対的なテキストは、NLPの複数のサブフィールドで重要な役割を果たします。
ただし、現在の研究には次の問題があります。
（1）ほとんどのテキストの敵対的攻撃方法は、豊富なリソース化された言語を対象としています。
あまり研究されていない言語の敵対的なテキストをどのように生成しますか？
（2）ほとんどのテキストの敵対的攻撃方法は、無効または曖昧な敵対的なテキストを生成する傾向があります。
高品質の敵対的堅牢性ベンチマークをどのように構築しますか？
（3）新しい言語モデルは、以前に生成された敵対的なテキストの一部に対して免疫がある場合があります。
敵対的な堅牢性ベンチマークをどのように更新しますか？
上記の問題に対処するために、ループ内の敵対的なテキストの一般的なアプローチに基づいたシステムであるHITL-GATを紹介します。
HITL-GATには、1つのパイプラインに4つの段階が含まれています。被害者モデルの構築、敵意の例生成、高品質のベンチマーク構造、敵対的な堅牢性評価です。
さらに、HITL-GATを利用して、他のあまり研究されていない言語の敵対的な研究の参照となるチベットスクリプトのケーススタディを作成します。

要約(オリジナル)

DNN-based language models perform excellently on various tasks, but even SOTA LLMs are susceptible to textual adversarial attacks. Adversarial texts play crucial roles in multiple subfields of NLP. However, current research has the following issues. (1) Most textual adversarial attack methods target rich-resourced languages. How do we generate adversarial texts for less-studied languages? (2) Most textual adversarial attack methods are prone to generating invalid or ambiguous adversarial texts. How do we construct high-quality adversarial robustness benchmarks? (3) New language models may be immune to part of previously generated adversarial texts. How do we update adversarial robustness benchmarks? To address the above issues, we introduce HITL-GAT, a system based on a general approach to human-in-the-loop generation of adversarial texts. HITL-GAT contains four stages in one pipeline: victim model construction, adversarial example generation, high-quality benchmark construction, and adversarial robustness evaluation. Additionally, we utilize HITL-GAT to make a case study on Tibetan script which can be a reference for the adversarial research of other less-studied languages.

arxiv情報

著者	Xi Cao,Yuan Sun,Jiajun Li,Quzong Gesang,Nuo Qun,Tashi Nyima
発行日	2025-03-21 09:32:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL, cs.CR, cs.HC | コメントを受け付けていません

Assessing Consistency and Reproducibility in the Outputs of Large Language Models: Evidence Across Diverse Finance and Accounting Tasks

投稿日: 2025年3月24日作成者: jarxiv

要約

この研究は、ファイナンスおよび会計研究における大規模な言語モデル（LLM）出力における一貫性と再現性に関する最初の包括的な評価を提供します。
LLMは、分類、センチメント分析、要約、テキスト生成、予測の5つの一般的なタスクにわたる50の独立した実行を使用した広範な実験を通じて、同一の入力を与えられた一貫したLLMがどのように出力を生成するかを評価します。
3つのOpenAIモデル（GPT-3.5-ターボ、GPT-4O-MINI、およびGPT-4O）を使用して、MD＆ASをカバーする多様な財務源テキストとデータから340万以上の出力を生成し、FOMCステートメント、財務ニュース記事、収益コールトランスクリプト、財務諸表をカバーします。
私たちの調査結果は、バイナリ分類と感情分析がほぼ完璧な再現性を達成することで、実質的であるがタスク依存性の一貫性を明らかにし、複雑なタスクはより大きなばらつきを示しています。
より高度なモデルは、タスク固有のパターンが出現し、より良い一貫性と再現性を一貫して実証するものではありません。
LLMSは、人間の専門家が大幅に同意しない場合でも、専門家のアノテーターよりも一貫性のある人間のアノテーターを大幅に上回り、高い合意を維持します。
さらに、3〜5回の実行にわたる単純な集約戦略が一貫性を劇的に改善することがわかります。
シミュレーション分析により、LLM出力で測定可能な矛盾にもかかわらず、下流の統計的推論は著しく堅牢であることが明らかになりました。
これらの調査結果は、私たちが「G-Hacking」と呼ぶもの、複数の生成AIが実行する有利な結果の選択的報告を、そのようなリスクが金融および会計タスクで比較的低いことを実証することに関する懸念に対処しています。

要約(オリジナル)

This study provides the first comprehensive assessment of consistency and reproducibility in Large Language Model (LLM) outputs in finance and accounting research. We evaluate how consistently LLMs produce outputs given identical inputs through extensive experimentation with 50 independent runs across five common tasks: classification, sentiment analysis, summarization, text generation, and prediction. Using three OpenAI models (GPT-3.5-turbo, GPT-4o-mini, and GPT-4o), we generate over 3.4 million outputs from diverse financial source texts and data, covering MD&As, FOMC statements, finance news articles, earnings call transcripts, and financial statements. Our findings reveal substantial but task-dependent consistency, with binary classification and sentiment analysis achieving near-perfect reproducibility, while complex tasks show greater variability. More advanced models do not consistently demonstrate better consistency and reproducibility, with task-specific patterns emerging. LLMs significantly outperform expert human annotators in consistency and maintain high agreement even where human experts significantly disagree. We further find that simple aggregation strategies across 3-5 runs dramatically improve consistency. Simulation analysis reveals that despite measurable inconsistency in LLM outputs, downstream statistical inferences remain remarkably robust. These findings address concerns about what we term ‘G-hacking,’ the selective reporting of favorable outcomes from multiple Generative AI runs, by demonstrating that such risks are relatively low for finance and accounting tasks.

arxiv情報

著者	Julian Junyan Wang,Victor Xiaoqi Wang
発行日	2025-03-21 09:43:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CE, cs.CL, cs.LG, q-fin.GN | コメントを受け付けていません

Token Dynamics: Towards Efficient and Dynamic Video Token Representation for Video Large Language Models

投稿日: 2025年3月24日作成者: jarxiv

要約

トークンベースのビデオ表現は、大きな言語モデルがビデオコンテンツを解釈できるようにするための有望なアプローチとして浮上しています。
ただし、トークンプルーニングやトークンマージなどの既存のトークン削減技術は、多くの場合、重要な空間的位置埋め込みを破壊し、計算効率とより少ないトークンのバランスを適切にバランスさせることができません。
その結果、これらの方法は比較的長いトークンシーケンスをもたらし、ビデオ大型言語モデルなどの極端なトークン圧縮を必要とするシナリオでの適用性を制限します。
この論文では、最小限のトークンで広範なビデオシーケンスを表現することを目指して、極端な短いトークン削減の新しいタスクを紹介します。
この課題に対処するために、トークンダイナミクスを提案します。これは、空間的な一貫性を維持しながらトークンカウントを動的に削減する新しいビデオ表現フレームワークです。
具体的には、視覚的な埋め込みをグリッドレベルのモーション情報から分離することにより、ビデオ表現を解き放ち、それらを以下に構成します。1。オブジェクトレベルのコンテンツを説明するクラスタリングトークンによって作成された簡潔なトークンベース。
2。トークンダイナミクスマップ、グリッド全体の詳細な空間的モーションパターンをキャプチャします。
さらに、トークンの長さを増やすことなく、モーション機能をトークンベースに統合し、それによってコンパクトさと空間的状態の完全性を維持するクロスダイナミクスの注意メカニズムを導入します。
この実験では、トークンカウントが元のトークンの0.07％にわずか0.07％に減少することを示しており、パフォーマンスが1.13％しか低下していません。
さらに、極端なトークン削減（固定長および適応長圧縮）内で2つの新しいサブタスクを提案します。どちらも、ビデオ言語タスクの長いトークンシーケンスを効果的に表しています。
この方法では、理論的な複雑さが大幅に低下し、トークンが少なく、スループットが強化されているため、ビデオLLMの効率的なソリューションが提供されます。

要約(オリジナル)

Token-based video representation has emerged as a promising approach for enabling large language models to interpret video content. However, existing token reduction techniques, such as token pruning and token merging, often disrupt essential spatial-temporal positional embeddings, failing to adequately balance computational efficiency with fewer tokens. Consequently, these methods result in relatively lengthy token sequences, limiting their applicability in scenarios requiring extreme token compression, such as video large language models. In this paper, we introduce the novel task of extreme short token reduction, aiming to represent extensive video sequences with a minimal number of tokens. To address this challenge, we propose Token Dynamics, a new video representation framework that dynamically reduces token count while preserving spatial-temporal coherence. Specifically, we disentangle video representations by separating visual embeddings from grid-level motion information, structuring them into: 1. a concise token base, created by clustering tokens that describe object-level content; 2. a token dynamics map, capturing detailed spatial-temporal motion patterns across grids. Furthermore, we introduce a cross-dynamics attention mechanism that integrates motion features into the token base without increasing token length, thereby maintaining compactness and spatial-temporal integrity. The experiments demonstrate a reduction of token count to merely 0.07% of the original tokens, with only a minor performance drop of 1.13%. Additionally, we propose two novel subtasks within extreme token reduction (fixed-length and adaptive-length compression), both effectively representing long token sequences for video-language tasks. Our method offers significantly lower theoretical complexity, fewer tokens, and enhanced throughput, thus providing an efficient solution for video LLMs.

arxiv情報

著者	Haichao Zhang,Zhuowei Li,Dimitris Metaxas,Yun Fu
発行日	2025-03-21 09:46:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

Building Multilingual Datasets for Predicting Mental Health Severity through LLMs: Prospects and Challenges

投稿日: 2025年3月24日作成者: jarxiv

要約

大規模な言語モデル（LLM）は、メンタルヘルスサポートシステムを含むさまざまな医療分野にますます統合されています。
ただし、英語以外のメンタルヘルスサポートアプリケーションにおけるLLMの有効性に関する研究にはギャップがあります。
この問題に対処するために、英語から6つの言語（ギリシャ語、トルコ語、フランス語、ポルトガル語、ドイツ語、フィンランド語）に翻訳された広く使用されているメンタルヘルスデータセットの新しい多言語の適応を提示します。
このデータセットにより、メンタルヘルスの状態を検出し、複数の言語にわたる重大度を評価する際のLLMパフォーマンスの包括的な評価が可能になります。
GPTとLlamaを実験することにより、同じ翻訳されたデータセットで評価されているにもかかわらず、言語間のパフォーマンスのかなりの変動性が観察されます。
この矛盾は、言語固有のニュアンスとメンタルヘルスデータのカバレッジがモデルの精度に影響を与える可能性のある多言語のメンタルヘルスサポートに固有の複雑さを強調します。
包括的なエラー分析を通じて、医療環境でLLMSのみに依存するリスク（たとえば、誤診に寄与する可能性）を強調します。
さらに、提案されているアプローチは、多言語タスクの大幅なコスト削減を提供し、広範な実装に大きな利点をもたらします。

要約(オリジナル)

Large Language Models (LLMs) are increasingly being integrated into various medical fields, including mental health support systems. However, there is a gap in research regarding the effectiveness of LLMs in non-English mental health support applications. To address this problem, we present a novel multilingual adaptation of widely-used mental health datasets, translated from English into six languages (e.g., Greek, Turkish, French, Portuguese, German, and Finnish). This dataset enables a comprehensive evaluation of LLM performance in detecting mental health conditions and assessing their severity across multiple languages. By experimenting with GPT and Llama, we observe considerable variability in performance across languages, despite being evaluated on the same translated dataset. This inconsistency underscores the complexities inherent in multilingual mental health support, where language-specific nuances and mental health data coverage can affect the accuracy of the models. Through comprehensive error analysis, we emphasize the risks of relying exclusively on LLMs in medical settings (e.g., their potential to contribute to misdiagnoses). Moreover, our proposed approach offers significant cost savings for multilingual tasks, presenting a major advantage for broad-scale implementation.

arxiv情報

著者	Konstantinos Skianis,John Pavlopoulos,A. Seza Doğruöz
発行日	2025-03-21 09:56:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL, cs.LG | コメントを受け付けていません

A Survey on Personalized Alignment — The Missing Piece for Large Language Models in Real-World Applications

投稿日: 2025年3月24日作成者: jarxiv

要約

大規模な言語モデル（LLMS）は顕著な能力を実証していますが、実際のアプリケーションへの移行は重要な制限を明らかにしています。普遍的な人間の価値との調整を維持しながら、個々の好みに適応できないことです。
現在のアライメント手法では、ユーザーの多様な背景やニーズに対応できない、すべてのサイズのアプローチを採用しています。
このペーパーでは、LLMが個々の好みに基づいて倫理的境界内で行動を適応させることができるパラダイムであるパラダイムの最初の包括的な調査を紹介します。
優先メモリ管理、パーソナライズされた生成、フィードバックベースのアラインメントを含む統一フレームワークを提案し、実装アプローチを体系的に分析し、さまざまなシナリオでの有効性を評価します。
現在の手法、潜在的なリスク、将来の課題を調べることにより、この調査は、より適応性があり倫理的に整合したLLMを開発するための構造化された基盤を提供します。

要約(オリジナル)

Large Language Models (LLMs) have demonstrated remarkable capabilities, yet their transition to real-world applications reveals a critical limitation: the inability to adapt to individual preferences while maintaining alignment with universal human values. Current alignment techniques adopt a one-size-fits-all approach that fails to accommodate users’ diverse backgrounds and needs. This paper presents the first comprehensive survey of personalized alignment-a paradigm that enables LLMs to adapt their behavior within ethical boundaries based on individual preferences. We propose a unified framework comprising preference memory management, personalized generation, and feedback-based alignment, systematically analyzing implementation approaches and evaluating their effectiveness across various scenarios. By examining current techniques, potential risks, and future challenges, this survey provides a structured foundation for developing more adaptable and ethically-aligned LLMs.

arxiv情報

著者	Jian Guan,Junfei Wu,Jia-Nan Li,Chuanqi Cheng,Wei Wu
発行日	2025-03-21 10:09:16+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL | コメントを受け付けていません

Text2Model: Generating dynamic chemical reactor models using large language models (LLMs)

投稿日: 2025年3月24日作成者: jarxiv

要約

大規模な言語モデルが自然言語を介して会話する際に顕著な能力を示しているため、LLMがドメイン固有のタスクを使用して研究および産業の化学エンジニアを潜在的に支援する方法について疑問が生じます。
ユーザー入力としてテキストの説明からモデリカコード形式で動的化学反応器モデルを生成します。
さまざまな原子炉シナリオの合成的に生成されたモデリカコードの指示を微調整します。
微調整されたモデルのパフォーマンスを、ベースラインLlama 3.1 8b指示モデルとGPT4Oと比較します。
生成された動的モデルの構文およびセマンティック精度に関するモデルの予測を手動で評価します。
モデリカモデルのセマンティックと構文の精度の両方に関して、微調整されたモデルによってかなりの改善が達成されることがわかります。
ただし、微調整されたモデルには、GPT4Oと比較して、目に見えないシナリオに一般化する満足のいく能力がありません。

要約(オリジナル)

As large language models have shown remarkable capabilities in conversing via natural language, the question arises as to how LLMs could potentially assist chemical engineers in research and industry with domain-specific tasks. We generate dynamic chemical reactor models in Modelica code format from textual descriptions as user input. We fine-tune Llama 3.1 8B Instruct on synthetically generated Modelica code for different reactor scenarios. We compare the performance of our fine-tuned model to the baseline Llama 3.1 8B Instruct model and GPT4o. We manually assess the models’ predictions regarding the syntactic and semantic accuracy of the generated dynamic models. We find that considerable improvements are achieved by the fine-tuned model with respect to both the semantic and the syntactic accuracy of the Modelica models. However, the fine-tuned model lacks a satisfactory ability to generalize to unseen scenarios compared to GPT4o.

arxiv情報

著者	Sophia Rupprecht,Yassine Hounat,Monisha Kumar,Giacomo Lastrucci,Artur M. Schweidtmann
発行日	2025-03-21 10:09:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL, cs.PL | コメントを受け付けていません

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

投稿日: 2025年3月24日作成者: jarxiv

要約

Sphinx-Xは、Sphinxで開発された広範なマルチモダリティ大手言語モデル（MLLM）シリーズを提案します。
アーキテクチャとトレーニングの効率を改善するために、冗長な視覚エンコーダーを削除し、スキップトークンで完全にパッドしたサブイメージをバイパスし、マルチステージトレーニングを1段階のオールインパラダイムに簡素化することにより、Sphinxフレームワークを変更します。
MLLMSの可能性を完全に解き放つために、言語、ビジョン、ビジョン言語タスクの公的に利用可能なリソースをカバーする包括的なマルチドメインおよびマルチモーダルデータセットを組み立てます。
さらに、キュレーションされたOCR集中的でセットマークデータセットでこのコレクションを豊かにし、多様性と一般性を拡大します。
Tinyllama1.1b、internlm2-7b、llama2-13b、mixtral8x7bを含むさまざまなベースLLMをトレーニングすることにより、パラメーターサイズと多言語機能が異なるMLLMのスペクトルを取得します。
包括的なベンチマークは、マルチモーダルパフォーマンスとデータスケールとの間に強い相関関係を明らかにしています。
コードとモデルはhttps://github.com/alpha-vllm/llama2-accessoryでリリースされます

要約(オリジナル)

We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series developed upon SPHINX. To improve the architecture and training efficiency, we modify the SPHINX framework by removing redundant visual encoders, bypassing fully-padded sub-images with skip tokens, and simplifying multi-stage training into a one-stage all-in-one paradigm. To fully unleash the potential of MLLMs, we assemble a comprehensive multi-domain and multimodal dataset covering publicly available resources in language, vision, and vision-language tasks. We further enrich this collection with our curated OCR intensive and Set-of-Mark datasets, extending the diversity and generality. By training over different base LLMs including TinyLlama1.1B, InternLM2-7B, LLaMA2-13B, and Mixtral8x7B, we obtain a spectrum of MLLMs that vary in parameter size and multilingual capabilities. Comprehensive benchmarking reveals a strong correlation between the multi-modal performance with the data and parameter scales. Code and models are released at https://github.com/Alpha-VLLM/LLaMA2-Accessory

arxiv情報

著者	Dongyang Liu,Renrui Zhang,Longtian Qiu,Siyuan Huang,Weifeng Lin,Shitian Zhao,Shijie Geng,Ziyi Lin,Peng Jin,Kaipeng Zhang,Wenqi Shao,Chao Xu,Conghui He,Junjun He,Hao Shao,Pan Lu,Hongsheng Li,Yu Qiao,Peng Gao
発行日	2025-03-21 10:19:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

From 1,000,000 Users to Every User: Scaling Up Personalized Preference for User-level Alignment

投稿日: 2025年3月24日作成者: jarxiv

要約

大規模な言語モデル（LLM）は、ユーザーの価値とニーズの多様性を根本的に見落として、均一な人間の好みを想定する1つのサイズのアプローチを通じて伝統的に整合してきました。
このペーパーでは、LLMSのスケーラブルなパーソナライズされたアライメントのための包括的なフレームワークを紹介します。
実際のシナリオで堅牢な優先推論のための多様なペルソナ表現とともに、心理的および行動的側面を特徴付ける体系的な選好空間を確立します。
この基盤の上に構築すると、130万を超えるパーソナライズされた選好例の大規模なデータセットである\ textSc {alignx}を導入し、2つの補完的なアライメントアプローチを開発します。
広範な実験は、既存の方法よりも大幅に改善されており、4つのベンチマークにわたって平均17.06 \％の精度が得られ、新しい好みに対する強力な適応能力、限られたユーザーデータへの堅牢性、および正確な好みの制御可能性を示します。
これらの結果は、私たちのフレームワークの有効性を検証し、真のユーザー適応AIシステムに向けて進歩しています。

要約(オリジナル)

Large language models (LLMs) have traditionally been aligned through one-size-fits-all approaches that assume uniform human preferences, fundamentally overlooking the diversity in user values and needs. This paper introduces a comprehensive framework for scalable personalized alignment of LLMs. We establish a systematic preference space characterizing psychological and behavioral dimensions, alongside diverse persona representations for robust preference inference in real-world scenarios. Building upon this foundation, we introduce \textsc{AlignX}, a large-scale dataset of over 1.3 million personalized preference examples, and develop two complementary alignment approaches: \textit{in-context alignment} directly conditioning on persona representations and \textit{preference-bridged alignment} modeling intermediate preference distributions. Extensive experiments demonstrate substantial improvements over existing methods, with an average 17.06\% accuracy gain across four benchmarks while exhibiting a strong adaptation capability to novel preferences, robustness to limited user data, and precise preference controllability. These results validate our framework’s effectiveness, advancing toward truly user-adaptive AI systems.

arxiv情報

著者	Jia-Nan Li,Jian Guan,Songhao Wu,Wei Wu,Rui Yan
発行日	2025-03-21 10:33:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント