jarxiv | Japanese arxiv | ページ 1236

Multi-Aggregator Time-Warping Heterogeneous Graph Neural Network for Personalized Micro-Video Recommendation

投稿日: 2025年3月24日作成者: jarxiv

要約

Micro-Videoの推奨は、世界的な注目を集め、あらゆる年齢の人々に人気のある日々のサービスになりつつあります。
最近、グラフニューラルネットワークベースのマイクロビデオ推奨により、多くの種類の推奨タスクのパフォーマンス改善が表示されました。
ただし、既存の作品は、ニュースのマイクロビデオ推奨の高い適時性や、頻繁に変化した関心の連続的な相互作用など、マイクロビデオの特性を完全に考慮することはできません。
このホワイトペーパーでは、マイクロビデオの特性が包括的に研究されているシーケンシャルセッションに基づいたパーソナライズされたニュースネイチャーネイチャーネイチャーマイクロビデオの推奨事項に対して、新しいマルチアグレージャーの時間帯の異種グラフネットワーク（MTHGNN）が提案されています。
最先端との比較を通じて、実験結果はMTHGNNモデルの優位性を検証します。

要約(オリジナル)

Micro-video recommendation is attracting global attention and becoming a popular daily service for people of all ages. Recently, Graph Neural Networks-based micro-video recommendation has displayed performance improvement for many kinds of recommendation tasks. However, the existing works fail to fully consider the characteristics of micro-videos, such as the high timeliness of news nature micro-video recommendation and sequential interactions of frequently changed interests. In this paper, a novel Multi-aggregator Time-warping Heterogeneous Graph Neural Network (MTHGNN) is proposed for personalized news nature micro-video recommendation based on sequential sessions, where characteristics of micro-videos are comprehensively studied, users’ preference is mined via multi-aggregator, the temporal and dynamic changes of users’ preference are captured, and timeliness is considered. Through the comparison with the state-of-the-arts, the experimental results validate the superiority of our MTHGNN model.

arxiv情報

著者	Jinkun Han,Wei Li,Zhipeng Cai,Yingshu Li
発行日	2025-03-21 16:08:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.IR | コメントを受け付けていません

Toward a method for LLM-enabled Indoor Navigation

投稿日: 2025年3月24日作成者: jarxiv

要約

屋内ナビゲーションは、複雑なレイアウト、GPSシグナルの欠如、アクセシビリティの懸念による独自の課題を提示します。
既存のソリューションは、多くの場合、リアルタイムの適応性とユーザー固有のニーズに苦しんでいます。
この作業では、大規模な言語モデル（LLM）、つまりChatGptの可能性を調査して、屋内マップ画像から自然なコンテキスト対応ナビゲーションの指示を生成します。
さまざまな現実世界の環境でテストケースを設計および評価し、空間レイアウトの解釈、ユーザーの制約の処理、および効率的なルートの計画におけるLLMの有効性を分析します。
私たちの調査結果は、パーソナライズされた屋内ナビゲーションをサポートするためのLLMSの可能性を示しています。平均52％の正しい適応症と最大62％です。
結果は、レイアウトの複雑さや予想されるパスの複雑さに依存するのではなく、関心のあるポイントとパフォーマンスに悪影響を与える視覚情報の豊富さに依存するように見えます。

要約(オリジナル)

Indoor navigation presents unique challenges due to complex layouts, lack of GPS signals, and accessibility concerns. Existing solutions often struggle with real-time adaptability and user-specific needs. In this work, we explore the potential of a Large Language Model (LLM), i.e., ChatGPT, to generate natural, context-aware navigation instructions from indoor map images. We design and evaluate test cases across different real-world environments, analyzing the effectiveness of LLMs in interpreting spatial layouts, handling user constraints, and planning efficient routes. Our findings demonstrate the potential of LLMs for supporting personalized indoor navigation, with an average of 52% correct indications and a maximum of 62%. The results do not appear to depend on the complexity of the layout or the complexity of the expected path, but rather on the number of points of interest and the abundance of visual information, which negatively affect the performance.

arxiv情報

著者	Alberto Coffrini,Mohammad Amin Zadenoori,Paolo Barsocchi,Francesco Furfari,Antonino Crivello,Alessio Ferrari
発行日	2025-03-21 16:17:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

Graph Masked Language Models

投稿日: 2025年3月24日作成者: jarxiv

要約

言語モデル（LMS）とグラフニューラルネットワーク（GNNS）は、それぞれの領域で大きな期待を示していますが、構造化されたグラフデータを豊富なテキスト情報と統合することは依然として困難です。
この作業では、GNNの構造学習と事前処理言語モデルのコンテキストパワーを組み合わせた新しいデュアルブランチアーキテクチャである\ emph {グラフマスク言語モデル}（GMLM）を提案します。
私たちのアプローチでは、2つの重要なイノベーションを導入します。（i）a \ emphince {semanticマスキング戦略}グラフトポロジーを活用して構造的重要性に基づいて選択的にマスクをマスクすること、および（ii）a \ emphe {ソフトマスキングメカニズム}は、元のノード機能と学習可能なマスクトークンの間を補間し、トレーニング中にスムーのある情報の流れを確保します。
複数のノード分類と言語理解ベンチマークに関する広範な実験は、GMLMが最先端のパフォーマンスを達成するだけでなく、強化された堅牢性と安定性を示すことを示しています。
この作業は、グラフ学習を改善するために構造化されたデータ表現と非構造化データ表現を統合することの利点を強調しています。

要約(オリジナル)

Language Models (LMs) and Graph Neural Networks (GNNs) have shown great promise in their respective areas, yet integrating structured graph data with rich textual information remains challenging. In this work, we propose \emph{Graph Masked Language Models} (GMLM), a novel dual-branch architecture that combines the structural learning of GNNs with the contextual power of pretrained language models. Our approach introduces two key innovations: (i) a \emph{semantic masking strategy} that leverages graph topology to selectively mask nodes based on their structural importance, and (ii) a \emph{soft masking mechanism} that interpolates between original node features and a learnable mask token, ensuring smoother information flow during training. Extensive experiments on multiple node classification and language understanding benchmarks demonstrate that GMLM not only achieves state-of-the-art performance but also exhibits enhanced robustness and stability. This work underscores the benefits of integrating structured and unstructured data representations for improved graph learning.

arxiv情報

著者	Aarush Sinha,OM Kumar CU
発行日	2025-03-21 16:42:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

Preference-Guided Diffusion for Multi-Objective Offline Optimization

投稿日: 2025年3月24日作成者: jarxiv

要約

オフラインの多目的最適化は、設計とその客観的価値のデータセットを考慮して、パレート最適ソリューションを特定することを目的としています。
この作業では、分類器ベースのガイダンスメカニズムを活用することにより、パレート最適な設計を生成する優先ガイド付き拡散モデルを提案します。
ガイダンス分類器は、ある設計が別の設計を支配する確率を予測するために訓練された選好モデルであり、拡散モデルを設計空間の最適な領域に向けます。
重要なことに、この選好モデルはトレーニングの分布を超えて一般化され、観測されたデータセットの外側のパレート最適ソリューションの発見を可能にします。
私たちは、多様性の優先順位を高め、多様性の優先順位を強化し、多様性の優先順位を強化します。
これにより、生成されたソリューションが最適であり、客観的な空間全体にわたって十分に分散されることが保証されます。これは、オフラインの多目的最適化のための以前の生成方法には存在しない機能です。
さまざまな連続オフラインの多目的最適化タスクに関するアプローチを評価し、フォワード/サロゲートベースの最適化方法と競争力を維持しながら、他の逆/生成アプローチを常に上回ることがわかります。
私たちの結果は、パレートの前面をよく近似する多様で高品質のソリューションを生成する際の分類誘導拡散モデルの有効性を強調しています。

要約(オリジナル)

Offline multi-objective optimization aims to identify Pareto-optimal solutions given a dataset of designs and their objective values. In this work, we propose a preference-guided diffusion model that generates Pareto-optimal designs by leveraging a classifier-based guidance mechanism. Our guidance classifier is a preference model trained to predict the probability that one design dominates another, directing the diffusion model toward optimal regions of the design space. Crucially, this preference model generalizes beyond the training distribution, enabling the discovery of Pareto-optimal solutions outside the observed dataset. We introduce a novel diversity-aware preference guidance, augmenting Pareto dominance preference with diversity criteria. This ensures that generated solutions are optimal and well-distributed across the objective space, a capability absent in prior generative methods for offline multi-objective optimization. We evaluate our approach on various continuous offline multi-objective optimization tasks and find that it consistently outperforms other inverse/generative approaches while remaining competitive with forward/surrogate-based optimization methods. Our results highlight the effectiveness of classifier-guided diffusion models in generating diverse and high-quality solutions that approximate the Pareto front well.

arxiv情報

著者	Yashas Annadani,Syrine Belakaria,Stefano Ermon,Stefan Bauer,Barbara E Engelhardt
発行日	2025-03-21 16:49:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

ModServe: Scalable and Resource-Efficient Large Multimodal Model Serving

投稿日: 2025年3月24日作成者: jarxiv

要約

大規模なマルチモーダルモデル（LMMS）は、画像、ビデオ、オーディオを超えたテキストを理解する上で印象的な機能を示しています。
ただし、生産環境でLMMを効率的に提供することは、複雑なアーキテクチャとマルチステージの推論パイプライン全体にわたる不均一な特性により、大きな課題をもたらします。
6つの代表的なオープンソースモデルにわたって、2つの顕著なLMMアーキテクチャ、デコーダーのみ、およびクロスアテナンスの2つの顕著なLMMアーキテクチャの最初の包括的なシステム分析を提示し、主要なシステムの設計上の意味を明らかにします。
また、生産LMM推論トレースの詳細な分析を提示し、可変、重尾のリクエスト分布、爆発性交通パターンなどの一意のワークロード特性を明らかにします。
これらの洞察に基づいて、独立した最適化と適応スケーリングのためにステージを切り離すモジュラーLMMサービングシステムであるModServeを提案します。
ModServeは、コストを最小限に抑えながら、テールレイテンシースロを満たすために、モダリティを認識したスケジューリングと自動焦点で段階とハンドルの段階とハンドルを処理します。
ModServeは、生産トレースを備えた128-GPUクラスターでSLOを満たしながら、3.3〜5.5倍高いスループット（25〜41.3％のコスト削減につながる）を達成します。

要約(オリジナル)

Large multimodal models (LMMs) demonstrate impressive capabilities in understanding images, videos, and audio beyond text. However, efficiently serving LMMs in production environments poses significant challenges due to their complex architectures and heterogeneous characteristics across their multi-stage inference pipelines. We present the first comprehensive systems analysis of two prominent LMM architectures, decoder-only and cross-attention, across six representative open-source models, revealing key systems design implications. We also present an in-depth analysis of production LMM inference traces, uncovering unique workload characteristics, including variable, heavy-tailed request distributions and bursty traffic patterns. Based on these insights, we propose ModServe, a modular LMM serving system that decouples stages for independent optimization and adaptive scaling. ModServe dynamically reconfigures stages and handles bursty traffic with modality-aware scheduling and autoscaling to meet tail latency SLOs while minimizing costs. ModServe achieves 3.3-5.5x higher throughput (leading to 25-41.3% cost saving) while meeting SLOs on a 128-GPU cluster with production traces.

arxiv情報

著者	Haoran Qiu,Anish Biswas,Zihan Zhao,Jayashree Mohan,Alind Khare,Esha Choukse,Íñigo Goiri,Zeyu Zhang,Haiying Shen,Chetan Bansal,Ramachandran Ramjee,Rodrigo Fonseca
発行日	2025-03-21 16:53:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.DC | コメントを受け付けていません

LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition Language

投稿日: 2025年3月24日作成者: jarxiv

要約

両手ロボット操作は重要な汎用性を提供しますが、両手間の空間的および時間的調整に関与する複雑さのために、固有の課題を提示します。
既存の作品は、主にロボットの手の人間レベルの操作スキルを達成することに焦点を当てていますが、長期の時間スケールのタスク計画にはほとんど注意が払われていません。
卓越したコンテキスト内学習とゼロショット生成能力により、大規模な言語モデル（LLM）が適用され、タスク計画を促進するための多様なロボット実施形態に基づいています。
ただし、LLMSは、長期の推論におけるエラーと、複雑なロボットタスクの幻覚に依然として悩まされており、計画を生成する際の論理的正しさの保証がありません。
LLM+Pなどの以前の作品は、シンボリックプランナーを備えたLLMSを拡張しました。
ただし、双方向ロボットに正常に適用されているものはありません。
新しい課題は必然的に両操作で発生し、効果的なタスク分解だけでなく、効率的なタスク割り当ても必要です。
これらの課題に対処するために、このペーパーでは、LLMの推論とマルチエージェント計画を統合し、効果的かつ効率的な双方向のタスク計画を自動化する第二の計画フレームワークであるLLM+MAPを紹介します。
私たちは、異なる複雑さのさまざまな長老操作タスクに関するシミュレートされた実験を実施します。
私たちの方法は、GPT-4Oをバックエンドとして使用して構築されており、GPT-4O、V3、最近の強力な推論モデルO1およびR1を含むLLMによって直接生成された計画とそのパフォーマンスを比較します。
計画時間、成功率、グループの借方、計画ステップ削減率などのメトリックを分析することにより、LLM+MAPの優れたパフォーマンスを実証し、ロボット推論に関する洞察も提供します。
コードはhttps://github.com/kchu/llm-mapで入手できます。

要約(オリジナル)

Bimanual robotic manipulation provides significant versatility, but also presents an inherent challenge due to the complexity involved in the spatial and temporal coordination between two hands. Existing works predominantly focus on attaining human-level manipulation skills for robotic hands, yet little attention has been paid to task planning on long-horizon timescales. With their outstanding in-context learning and zero-shot generation abilities, Large Language Models (LLMs) have been applied and grounded in diverse robotic embodiments to facilitate task planning. However, LLMs still suffer from errors in long-horizon reasoning and from hallucinations in complex robotic tasks, lacking a guarantee of logical correctness when generating the plan. Previous works, such as LLM+P, extended LLMs with symbolic planners. However, none have been successfully applied to bimanual robots. New challenges inevitably arise in bimanual manipulation, necessitating not only effective task decomposition but also efficient task allocation. To address these challenges, this paper introduces LLM+MAP, a bimanual planning framework that integrates LLM reasoning and multi-agent planning, automating effective and efficient bimanual task planning. We conduct simulated experiments on various long-horizon manipulation tasks of differing complexity. Our method is built using GPT-4o as the backend, and we compare its performance against plans generated directly by LLMs, including GPT-4o, V3 and also recent strong reasoning models o1 and R1. By analyzing metrics such as planning time, success rate, group debits, and planning-step reduction rate, we demonstrate the superior performance of LLM+MAP, while also providing insights into robotic reasoning. Code is available at https://github.com/Kchu/LLM-MAP.

arxiv情報

著者	Kun Chu,Xufeng Zhao,Cornelius Weber,Stefan Wermter
発行日	2025-03-21 17:04:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.RO | コメントを受け付けていません

CVE-Bench: A Benchmark for AI Agents’ Ability to Exploit Real-World Web Application Vulnerabilities

投稿日: 2025年3月24日作成者: jarxiv

要約

大規模な言語モデル（LLM）エージェントは、サイバー攻撃を自律的に実施することができるようになり、既存のアプリケーションに大きな脅威をもたらします。
この成長するリスクは、LLMエージェントがWebアプリケーションの脆弱性を活用する能力を評価するための現実世界のベンチマークの緊急の必要性を強調しています。
ただし、既存のベンチマークは、旗の競争を抽象化するか、包括的なカバレッジの欠如に限定されているため、不足しています。
現実世界の脆弱性のベンチマークを構築するには、エクスプロイトを再現するための専門的な専門知識と、予測不可能な脅威を評価するための体系的なアプローチの両方が含まれます。
この課題に対処するために、批判的過激性の共通の脆弱性と露出に基づいて、実際のサイバーセキュリティベンチマークであるCVE-Benchを紹介します。
CVE-Benchでは、LLMエージェントが実際の条件を模倣するシナリオで脆弱なWebアプリケーションを活用することを可能にするサンドボックスフレームワークを設計し、その悪用の効果的な評価も提供します。
私たちの評価は、最先端のエージェントフレームワークが脆弱性の最大13％を解決できることを示しています。

要約(オリジナル)

Large language model (LLM) agents are increasingly capable of autonomously conducting cyberattacks, posing significant threats to existing applications. This growing risk highlights the urgent need for a real-world benchmark to evaluate the ability of LLM agents to exploit web application vulnerabilities. However, existing benchmarks fall short as they are limited to abstracted Capture the Flag competitions or lack comprehensive coverage. Building a benchmark for real-world vulnerabilities involves both specialized expertise to reproduce exploits and a systematic approach to evaluating unpredictable threats. To address this challenge, we introduce CVE-Bench, a real-world cybersecurity benchmark based on critical-severity Common Vulnerabilities and Exposures. In CVE-Bench, we design a sandbox framework that enables LLM agents to exploit vulnerable web applications in scenarios that mimic real-world conditions, while also providing effective evaluation of their exploits. Our evaluation shows that the state-of-the-art agent framework can resolve up to 13% of vulnerabilities.

arxiv情報

著者	Yuxuan Zhu,Antony Kellermann,Dylan Bowman,Philip Li,Akul Gupta,Adarsh Danda,Richard Fang,Conner Jensen,Eric Ihli,Jason Benn,Jet Geronimo,Avi Dhir,Sudhit Rao,Kaicheng Yu,Twm Stone,Daniel Kang
発行日	2025-03-21 17:32:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CR, I.2.1 | コメントを受け付けていません

GreenIQ: A Deep Search Platform for Comprehensive Carbon Market Analysis and Automated Report Generation

投稿日: 2025年3月24日作成者: jarxiv

要約

この研究では、自律分析と自動レポート生成を通じて炭素市場のインテリジェンスに革命をもたらすように設計されたAIを搭載したディープ検索プラットフォームであるGreeniqを紹介します。
炭素市場は、多様な規制環境全体で動作し、政策文書、業界レポート、学術文献、リアルタイム取引プラットフォームから膨大な量の不均一なデータを生成します。
従来の研究アプローチは、労働集約的で、ゆっくりと、拡張が困難なままです。
GreeniQは、大規模な言語モデル（LLMS）を搭載したマルチエージェントアーキテクチャを通じてこれらの制限に対処し、5つの専門的なAIエージェントを統合します。インテリジェント情報検索の主な研究者エージェント、構造化された合成のレポートライティングエージェント、精度検証のための最終的なレビューエージェント、解釈性の強化のためのデータ視覚化エージェント、および多数のマルチル化エージェントです。
このシステムは、AI駆動型の引用検証を使用した構造化および非構造化情報のシームレスな統合を実現し、高い透明性と信頼性を確保します。
Greeniqは、従来の研究方法論と比較して、処理時間の99.2 \％の短縮と99.7％のコスト削減を提供します。
16ドメイン固有のAIペルソナを含む新しいAIペルソナベースの評価フレームワークは、その優れた司法横断分析機能と規制洞察生成を強調しています。
Greeniqは、炭素市場の調査を合理化することにより、AI主導の研究統合、政策分析、および持続可能性ファイナンスに新しい基準を設定しています。
環境および金融インテリジェンスの効率的でスケーラブルなフレームワークを提供し、複雑な規制環境でより正確でタイムリーで費用対効果の高い意思決定を可能にします

要約(オリジナル)

This study introduces GreenIQ, an AI-powered deep search platform designed to revolutionise carbon market intelligence through autonomous analysis and automated report generation. Carbon markets operate across diverse regulatory landscapes, generating vast amounts of heterogeneous data from policy documents, industry reports, academic literature, and real-time trading platforms. Traditional research approaches remain labour-intensive, slow, and difficult to scale. GreenIQ addresses these limitations through a multi-agent architecture powered by Large Language Models (LLMs), integrating five specialised AI agents: a Main Researcher Agent for intelligent information retrieval, a Report Writing Agent for structured synthesis, a Final Reviewer Agent for accuracy verification, a Data Visualisation Agent for enhanced interpretability, and a Translator Agent for multilingual adaptation. The system achieves seamless integration of structured and unstructured information with AI-driven citation verification, ensuring high transparency and reliability. GreenIQ delivers a 99.2\% reduction in processing time and a 99.7\% cost reduction compared to traditional research methodologies. A novel AI persona-based evaluation framework involving 16 domain-specific AI personas highlights its superior cross-jurisdictional analytical capabilities and regulatory insight generation. GreenIQ sets new standards in AI-driven research synthesis, policy analysis, and sustainability finance by streamlining carbon market research. It offers an efficient and scalable framework for environmental and financial intelligence, enabling more accurate, timely, and cost-effective decision-making in complex regulatory landscapes

arxiv情報

著者	Oluwole Fagbohun,Sai Yashwanth,Akinyemi Sadeeq Akintola,Ifeoluwa Wurola,Lanre Shittu,Aniema Inyang,Oluwatimilehin Odubola,Udodirim Offia,Said Olanrewaju,Ogidan Toluwaleke,Ilemona Abutu,Taiwo Akinbolaji
発行日	2025-03-21 17:33:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI | コメントを受け付けていません

Efficient Intent-Based Filtering for Multi-Party Conversations Using Knowledge Distillation from LLMs

投稿日: 2025年3月24日作成者: jarxiv

要約

大規模な言語モデル（LLMS）は、会話型AIの顕著な機能を紹介し、チャットボットのオープンドメイン応答を可能にし、要約、意図分類、洞察の生成などの会話の高度な処理を可能にしています。
ただし、これらのモデルはリソース集約型であり、実質的なメモリと計算能力を要求しています。
これに対処するために、すべてのスニペットを処理するのではなく、ターゲットダウンストリームアプリケーションに合わせたLLM処理の会話型スニペットをフィルタリングする費用対効果の高いソリューションを提案します。
この作業では、LLMSからの知識の蒸留を活用して、マルチパーティ会話の意図ベースのフィルターを開発する革新的なアプローチを導入します。
私たちの方法は、さまざまな戦略を組み合わせて、多様なマルチパーティの会話データセットを作成します。つまり、ターゲットの意図と注釈が付けられ、マルチラベルの意図分類のためにMobileBertモデルを微調整するために使用されます。
このモデルは、効率とパフォーマンスのバランスを達成し、その意図に基づいて会話スニペットを効果的にフィルタリングします。
関連するスニペットのみをLLMに渡すことにより、さらなる処理のために、私たちのアプローチは、実験で実証されているように、意図とデータ分布に応じて全体的な運用コストを大幅に削減します。

要約(オリジナル)

Large language models (LLMs) have showcased remarkable capabilities in conversational AI, enabling open-domain responses in chat-bots, as well as advanced processing of conversations like summarization, intent classification, and insights generation. However, these models are resource-intensive, demanding substantial memory and computational power. To address this, we propose a cost-effective solution that filters conversational snippets of interest for LLM processing, tailored to the target downstream application, rather than processing every snippet. In this work, we introduce an innovative approach that leverages knowledge distillation from LLMs to develop an intent-based filter for multi-party conversations, optimized for compute power constrained environments. Our method combines different strategies to create a diverse multi-party conversational dataset, that is annotated with the target intents and is then used to fine-tune the MobileBERT model for multi-label intent classification. This model achieves a balance between efficiency and performance, effectively filtering conversation snippets based on their intents. By passing only the relevant snippets to the LLM for further processing, our approach significantly reduces overall operational costs depending on the intents and the data distribution as demonstrated in our experiments.

arxiv情報

著者	Reem Gody,Mohamed Abdelghaffar,Mohammed Jabreel,Ahmed Tawfik
発行日	2025-03-21 17:34:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

Capturing Individual Human Preferences with Reward Features

投稿日: 2025年3月24日作成者: jarxiv

要約

人間のフィードバックからの強化学習は通常、人々を区別しない報酬モデルを使用して好みをモデル化します。
私たちは、大規模な言語モデルのトレーニングのように、意見の相違の可能性が高いコンテキストでは、これが良いデザインの選択である可能性は低いと主張します。
人またはグループに報酬モデルを専門とする方法を提案します。
私たちのアプローチは、個々の好みが一連の一般的な報酬機能の線形組み合わせとしてキャプチャできるという観察に基づいています。
そのような機能を学習する方法を示し、その後、それらの好みがトレーニングデータに反映されていなくても、それらを特定の個人に迅速に適応させるためにそれらを使用します。
提案されたアーキテクチャと非適応報酬モデルを比較した大規模な言語モデルでの実験と、コンテキスト内パーソナライズを行うモデルを含む適応性のあるカウンターパートも提示します。
トレーニングデータにどれだけの意見の相違があるかに応じて、モデルはベースラインを大幅に上回るか、パフォーマンスをよりシンプルなアーキテクチャとより安定したトレーニングと一致させます。

要約(オリジナル)

Reinforcement learning from human feedback usually models preferences using a reward model that does not distinguish between people. We argue that this is unlikely to be a good design choice in contexts with high potential for disagreement, like in the training of large language models. We propose a method to specialise a reward model to a person or group of people. Our approach builds on the observation that individual preferences can be captured as a linear combination of a set of general reward features. We show how to learn such features and subsequently use them to quickly adapt the reward model to a specific individual, even if their preferences are not reflected in the training data. We present experiments with large language models comparing the proposed architecture with a non-adaptive reward model and also adaptive counterparts, including models that do in-context personalisation. Depending on how much disagreement there is in the training data, our model either significantly outperforms the baselines or matches their performance with a simpler architecture and more stable training.

arxiv情報

著者	André Barreto,Vincent Dumoulin,Yiran Mao,Nicolas Perez-Nieves,Bobak Shahriari,Yann Dauphin,Doina Precup,Hugo Larochelle
発行日	2025-03-21 17:39:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG, stat.ML | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント