jarxiv | Japanese arxiv | ページ 125

Comparing human and LLM proofreading in L2 writing: Impact on lexical and syntactic features

投稿日: 2025年6月11日作成者: jarxiv

要約

この研究では、同一の第二言語文書の全体的な把握を改善することを目的とした人間およびLLM校正の語彙的および構文的介入を調べ、3つのLLM（ChatGPT-4O、LLAMA3.1-8B、DeepSeek-R1-8B）にわたる結果の一貫性を評価します。
調査結果は、人間とLLMの両方の校正がBigRamの語彙的特徴を強化することを示しており、これが隣接する単語間のより良い一貫性と文脈的つながりに寄与する可能性があることを示しています。
ただし、LLM校正は、より生成的なアプローチを示し、より多様で洗練された語彙を採用し、名詞句に多くの形容詞修飾子を組み込むなど、語彙と文の構造を広く再加工します。
校正の結果は、3つのモデルの主要な語彙的および構文的な特徴で非常に一貫しています。

要約(オリジナル)

This study examines the lexical and syntactic interventions of human and LLM proofreading aimed at improving overall intelligibility in identical second language writings, and evaluates the consistency of outcomes across three LLMs (ChatGPT-4o, Llama3.1-8b, Deepseek-r1-8b). Findings show that both human and LLM proofreading enhance bigram lexical features, which may contribute to better coherence and contextual connectedness between adjacent words. However, LLM proofreading exhibits a more generative approach, extensively reworking vocabulary and sentence structures, such as employing more diverse and sophisticated vocabulary and incorporating a greater number of adjective modifiers in noun phrases. The proofreading outcomes are highly consistent in major lexical and syntactic features across the three models.

arxiv情報

著者	Hakyung Sung,Karla Csuros,Min-Chang Sung
発行日	2025-06-10 17:49:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL | コメントを受け付けていません

e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs

投稿日: 2025年6月11日作成者: jarxiv

要約

テスト時間スケーリングは、推論時により多くの計算を利用することにより、LLMの推論を改善するための有望なパスを提供します。
ただし、このパラダイムの真の約束は、外挿にあります（つまり、LLMが訓練された最大トークン予算を超えて、LLMがより長く「考え続ける」ため、困難な問題のパフォーマンスの改善）。
驚くべきことに、ほとんどの既存の推論モデルは十分に外挿していないことがわかります。
外挿を有効にする1つの方法は、LLMをトレーニングしてコンテキスト内探索を実行することであることを示します。LLMをトレーニングして、操作（生成、検証、洗練など）をチェーンすることでテスト時間を効果的に費やすか、回答にコミットする前に複数の仮説をテストすることです。
コンテキスト内探索を有効にするために、レシピE3の一部として3つの重要な成分を特定します。（1）ベースLLMが非対称の能力を持っている、たとえば、生成（ハード）とのチェーン検証（ハード）を、コンテキスト内検索を実装する方法として、チェーンスキルを識別します。
（2）RL中の探査を増幅するために誤ったトレースから「負」の勾配を活用し、追加の非対称性を連鎖させるより長い検索トレースをもたらします。
（3）具体的に設計されたカリキュラムを介したトレーニング中のトークン予算のトレーニングの難しさを結合して、コンテキスト内探査を構築します。
私たちのレシピE3は、AIME’25およびHMMT’25スコアに従って最もよく知られている1.7Bモデルを生成し、トレーニングトークン予算を2倍に挿入します。
E3-1.7Bモデルは、ハイパス@1スコアを達成するだけでなく、ベースモデルでパス@Kを改善します。

要約(オリジナル)

Test-time scaling offers a promising path to improve LLM reasoning by utilizing more compute at inference time; however, the true promise of this paradigm lies in extrapolation (i.e., improvement in performance on hard problems as LLMs keep ‘thinking’ for longer, beyond the maximum token budget they were trained on). Surprisingly, we find that most existing reasoning models do not extrapolate well. We show that one way to enable extrapolation is by training the LLM to perform in-context exploration: training the LLM to effectively spend its test time budget by chaining operations (such as generation, verification, refinement, etc.), or testing multiple hypotheses before it commits to an answer. To enable in-context exploration, we identify three key ingredients as part of our recipe e3: (1) chaining skills that the base LLM has asymmetric competence in, e.g., chaining verification (easy) with generation (hard), as a way to implement in-context search; (2) leveraging ‘negative’ gradients from incorrect traces to amplify exploration during RL, resulting in longer search traces that chains additional asymmetries; and (3) coupling task difficulty with training token budget during training via a specifically-designed curriculum to structure in-context exploration. Our recipe e3 produces the best known 1.7B model according to AIME’25 and HMMT’25 scores, and extrapolates to 2x the training token budget. Our e3-1.7B model not only attains high pass@1 scores, but also improves pass@k over the base model.

arxiv情報

著者	Amrith Setlur,Matthew Y. R. Yang,Charlie Snell,Jeremy Greer,Ian Wu,Virginia Smith,Max Simchowitz,Aviral Kumar
発行日	2025-06-10 17:52:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL, cs.LG | コメントを受け付けていません

Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs

投稿日: 2025年6月11日作成者: jarxiv

要約

Vision-Language Models（VLMS）は、視覚入力に関する質問に答える（画像内のオブジェクトをカウントする）印象的な能力を示していますが、テキストで類似のタスクを実行するときにより高い精度を示します（例：テキストで単語をカウントする）。
この精度のギャップは、異なるモダリティで\ textit {circuits}（タスク固有の計算サブグラフ）を識別して比較することで調査します。
回路はモダリティ間で大部分がばらばらであるが、比較的類似した機能を実装することを示します。違いは主にモダリティ固有のデータ位置（画像またはテキストシーケンス）を処理することにあります。
画像データの表現をズームインすると、それらが後のレイヤーにのみ高性能に類似したテキスト表現と整合し、その後の位置に効果的に影響を与えるには処理が遅すぎることがわかります。
これを克服するために、後のレイヤーからの視覚データトークンの表現を以前のレイヤーに戻します。
複数のタスクとモデルを使用した実験では、この単純な介入は、平均して、モダリティ間のパフォーマンスギャップの3分の1を閉じます。
私たちの分析は、VLMSのマルチモーダルパフォーマンスギャップに光を当て、それを減らすためのトレーニングなしのアプローチを示唆しています。

要約(オリジナル)

Vision-Language models (VLMs) show impressive abilities to answer questions on visual inputs (e.g., counting objects in an image), yet demonstrate higher accuracies when performing an analogous task on text (e.g., counting words in a text). We investigate this accuracy gap by identifying and comparing the \textit{circuits} – the task-specific computational sub-graphs – in different modalities. We show that while circuits are largely disjoint between modalities, they implement relatively similar functionalities: the differences lie primarily in processing modality-specific data positions (an image or a text sequence). Zooming in on the image data representations, we observe they become aligned with the higher-performing analogous textual representations only towards later layers, too late in processing to effectively influence subsequent positions. To overcome this, we patch the representations of visual data tokens from later layers back into earlier layers. In experiments with multiple tasks and models, this simple intervention closes a third of the performance gap between the modalities, on average. Our analysis sheds light on the multi-modal performance gap in VLMs and suggests a training-free approach for reducing it.

arxiv情報

著者	Yaniv Nikankin,Dana Arad,Yossi Gandelsman,Yonatan Belinkov
発行日	2025-06-10 17:59:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: 68T5, cs.CL, I.2.7 | コメントを受け付けていません

On The Impact of Merge Request Deviations on Code Review Practices

投稿日: 2025年6月11日作成者: jarxiv

要約

コードレビューは、ソフトウェアエンジニアリングの重要な慣行であり、品質とコラボレーションを確保しています。
ただし、Industrial Merge Request（MR）ワークフローは、多くの場合、標準化されたレビュープロセスから逸脱しており、多くのMRSは非レビューの目的（ドラフト、リベース、または依存関係の更新など）にサービスを提供しています。
これらのケースと逸脱し、それらを無視することを無視することは分析をバイアスし、レビュー分析のためにMLモデルを損なうと仮定します。
MRSの37.02％で発生する7つの偏差カテゴリを特定し、少数の学習検出方法（91％の精度）を提案します。
偏差を除外することにより、MLモデルはレビュー完了時間を予測して、53.33％の症例（最大2.25倍）でパフォーマンスを改善し、機能の重要性（全体で47％、60％のトップ*k*）の有意な変化を示します。
私たちの貢献には、（1）MR逸脱の分類法、（2）AI駆動型検出アプローチ、および（3）MLベースのレビュー分析への影響の経験的証拠が含まれます。
この仕事は、実務家がレビューの取り組みを最適化し、信頼できる洞察を確保するのを支援します。

要約(オリジナル)

Code review is a key practice in software engineering, ensuring quality and collaboration. However, industrial Merge Request (MR) workflows often deviate from standardized review processes, with many MRs serving non-review purposes (e.g., drafts, rebases, or dependency updates). We term these cases deviations and hypothesize that ignoring them biases analytics and undermines ML models for review analysis. We identify seven deviation categories, occurring in 37.02% of MRs, and propose a few-shot learning detection method (91% accuracy). By excluding deviations, ML models predicting review completion time improve performance in 53.33% of cases (up to 2.25x) and exhibit significant shifts in feature importance (47% overall, 60% top-*k*). Our contributions include: (1) a taxonomy of MR deviations, (2) an AI-driven detection approach, and (3) empirical evidence of their impact on ML-based review analytics. This work aids practitioners in optimizing review efforts and ensuring reliable insights.

arxiv情報

著者	Samah Kansab,Francis Bordeleau,Ali Tizghadam
発行日	2025-06-10 14:51:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG, cs.SE | コメントを受け付けていません

Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task

投稿日: 2025年6月11日作成者: jarxiv

要約

この研究では、LLM支援エッセイの執筆の神経と行動の結果を探ります。
参加者は、LLM、検索エンジン、および脳のみ（ツールなし）の3つのグループに分けられました。
それぞれが同じ条件下で3つのセッションを完了しました。
4回目のセッションでは、LLMユーザーは脳専用グループ（LLMから脳へ）に再割り当てされ、脳のみのユーザーがLLM状態（脳からLLM）に再割り当てされました。
合計54人の参加者がセッション1〜3に参加し、18人がセッション4を完了しました。エッセイの執筆中に脳波（EEG）を使用して認知負荷を評価し、NLPを使用したエッセイを分析し、人間の教師とAI裁判官の助けを借りてエッセイを採点しました。
グループ間で、NER、N-GRAMパターン、およびトピックオントロジーは、グループ内の均一性を示しました。
EEGは、脳のつながりに大きな違いを明らかにしました。脳のみの参加者は、最も強く、最も分散したネットワークを示しました。
検索エンジンユーザーは、適度なエンゲージメントを示しました。
LLMユーザーは、最も弱い接続性を表示しました。
外部ツールの使用に関連して縮小された認知活動。
セッション4では、LLMから脳への参加者がアルファとベータの接続性の低下を示し、エンゲージメントが不足していることを示しています。
脳からLLMのユーザーは、検索エンジンユーザーと同様に、後頭部および前頭前野のより高いメモリリコールと活性化を示しました。
エッセイの自己報告された所有権は、LLMグループで最も低く、脳のみのグループで最も高いものでした。
また、LLMユーザーは、自分の作業を正確に引用するのに苦労しました。
LLMはすぐに便利になりますが、私たちの調査結果は潜在的な認知コストを強調しています。
4か月にわたって、LLMユーザーは一貫して神経、言語、および行動レベルでパフォーマンスが低下しました。
これらの結果は、LLMリライアンスの長期的な教育的影響に関する懸念を提起し、学習におけるAIの役割をより深く調査する必要性を強調しています。

要約(オリジナル)

This study explores the neural and behavioral consequences of LLM-assisted essay writing. Participants were divided into three groups: LLM, Search Engine, and Brain-only (no tools). Each completed three sessions under the same condition. In a fourth session, LLM users were reassigned to Brain-only group (LLM-to-Brain), and Brain-only users were reassigned to LLM condition (Brain-to-LLM). A total of 54 participants took part in Sessions 1-3, with 18 completing session 4. We used electroencephalography (EEG) to assess cognitive load during essay writing, and analyzed essays using NLP, as well as scoring essays with the help from human teachers and an AI judge. Across groups, NERs, n-gram patterns, and topic ontology showed within-group homogeneity. EEG revealed significant differences in brain connectivity: Brain-only participants exhibited the strongest, most distributed networks; Search Engine users showed moderate engagement; and LLM users displayed the weakest connectivity. Cognitive activity scaled down in relation to external tool use. In session 4, LLM-to-Brain participants showed reduced alpha and beta connectivity, indicating under-engagement. Brain-to-LLM users exhibited higher memory recall and activation of occipito-parietal and prefrontal areas, similar to Search Engine users. Self-reported ownership of essays was the lowest in the LLM group and the highest in the Brain-only group. LLM users also struggled to accurately quote their own work. While LLMs offer immediate convenience, our findings highlight potential cognitive costs. Over four months, LLM users consistently underperformed at neural, linguistic, and behavioral levels. These results raise concerns about the long-term educational implications of LLM reliance and underscore the need for deeper inquiry into AI’s role in learning.

arxiv情報

著者	Nataliya Kosmyna,Eugene Hauptmann,Ye Tong Yuan,Jessica Situ,Xian-Hao Liao,Ashly Vivian Beresnitzky,Iris Braunstein,Pattie Maes
発行日	2025-06-10 15:04:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI | コメントを受け付けていません

SeerAttention-R: Sparse Attention Adaptation for Long Reasoning

投稿日: 2025年6月11日作成者: jarxiv

要約

Seerattention-Rを紹介します。これは、推論モデルの長いデコードに合わせて特別に調整されたまばらな注意フレームワークです。
Seerattentionから拡張されたSeerattention-Rは、自己誘導ゲーティングメカニズムを介して注意のスパースを学習することの設計を保持しながら、自動回帰デコードに対応するためにクエリプーリングを削除します。
軽量のプラグインゲーティングを使用すると、Seerattention-Rは柔軟性があり、元のパラメーターを変更せずに既存の前提型モデルに簡単に統合できます。
わずか0.4bトークンで訓練されたSeerattention-Rは、大規模な注意ブロックサイズ（64/128）の下で、AIMEベンチマークで4Kトークン予算でほぼ紛れもない推論の精度を維持していることを実証します。
Tilelangを使用して、H100 GPUで90％スパースでFlashattention-3を超える最大9倍のほぼ理論的スピードアップを達成する高度に最適化されたスパースデコードカーネルを開発します。
コードは、https：//github.com/microsoft/seerattentionで入手できます。

要約(オリジナル)

We introduce SeerAttention-R, a sparse attention framework specifically tailored for the long decoding of reasoning models. Extended from SeerAttention, SeerAttention-R retains the design of learning attention sparsity through a self-distilled gating mechanism, while removing query pooling to accommodate auto-regressive decoding. With a lightweight plug-in gating, SeerAttention-R is flexible and can be easily integrated into existing pretrained model without modifying the original parameters. We demonstrate that SeerAttention-R, trained on just 0.4B tokens, maintains near-lossless reasoning accuracy with 4K token budget in AIME benchmark under large sparse attention block sizes (64/128). Using TileLang, we develop a highly optimized sparse decoding kernel that achieves near-theoretical speedups of up to 9x over FlashAttention-3 on H100 GPU at 90% sparsity. Code is available at: https://github.com/microsoft/SeerAttention.

arxiv情報

著者	Yizhao Gao,Shuming Guo,Shijie Cao,Yuqing Xia,Yu Cheng,Lei Wang,Lingxiao Ma,Yutao Sun,Tianzhu Ye,Li Dong,Hayden Kwok-Hay So,Yu Hua,Ting Cao,Fan Yang,Mao Yang
発行日	2025-06-10 15:17:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

PlantBert: An Open Source Language Model for Plant Science

投稿日: 2025年6月11日作成者: jarxiv

要約

トランスベースの言語モデルの急速な進歩は、生物医学および臨床の自然言語処理における触媒的なブレークスルーを触媒しています。
ただし、植物科学はそのようなドメインに適応したツールによって著しくサービスを受けていないままです。
この作業では、植物のストレス応答文献から構造化された知識を抽出するために特別に調整された高性能でオープンソース言語モデルであるPlantbertを提示します。
デルバータの建築に基づいて構築されており、その脱目のある注意と堅牢な文脈エンコーディングプラントベルトで知られていることは、エキスパートが発音したアブストラクトの細心の注意を払ってキュレーションされたコーパスで微調整されており、レンズ豆（レンズキュリナリス）に重点を置いて、異なる生物的および生物性ストレッサーに対する反応があります。
私たちの方法論では、トランスベースのモデリングとルール強化言語のポストプロセスとオントロジーに基づいたエンティティの正規化を組み合わせて、Plantbertが精度と意味の忠実度と生物学的に意味のある関係を捉えることができます。
基礎となるコーパスは、植物適応の分子、生理学的、生化学的、および農学的な側面を包含する作物オントロジーに沿った階層スキーマを使用して注釈されます。
Plantbertは、エンティティタイプ全体で強力な一般化機能を示し、低リソースの科学分野での堅牢なドメイン適応の実現可能性を実証しています。
高解像度のエンティティ認識のためのスケーラブルで再現可能なフレームワークを提供することにより、Plantbertは農業NLPの重要なギャップを埋め、植物のゲノミクス、フェノミクス、および農業知識の発見におけるインテリジェントなデータ駆動型システムの道を開きます。
私たちのモデルは、透明性を促進し、計算植物科学における学際的な革新を加速するために公開されています。

要約(オリジナル)

The rapid advancement of transformer-based language models has catalyzed breakthroughs in biomedical and clinical natural language processing; however, plant science remains markedly underserved by such domain-adapted tools. In this work, we present PlantBert, a high-performance, open-source language model specifically tailored for extracting structured knowledge from plant stress-response literature. Built upon the DeBERTa architecture-known for its disentangled attention and robust contextual encoding-PlantBert is fine-tuned on a meticulously curated corpus of expert-annotated abstracts, with a primary focus on lentil (Lens culinaris) responses to diverse abiotic and biotic stressors. Our methodology combines transformer-based modeling with rule-enhanced linguistic post-processing and ontology-grounded entity normalization, enabling PlantBert to capture biologically meaningful relationships with precision and semantic fidelity. The underlying corpus is annotated using a hierarchical schema aligned with the Crop Ontology, encompassing molecular, physiological, biochemical, and agronomic dimensions of plant adaptation. PlantBert exhibits strong generalization capabilities across entity types and demonstrates the feasibility of robust domain adaptation in low-resource scientific fields. By providing a scalable and reproducible framework for high-resolution entity recognition, PlantBert bridges a critical gap in agricultural NLP and paves the way for intelligent, data-driven systems in plant genomics, phenomics, and agronomic knowledge discovery. Our model is publicly released to promote transparency and accelerate cross-disciplinary innovation in computational plant science.

arxiv情報

著者	Hiba Khey,Amine Lakhder,Salma Rouichi,Imane El Ghabi,Kamal Hejjaoui,Younes En-nahli,Fahd Kalloubi,Moez Amri
発行日	2025-06-10 15:24:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

Preference-Driven Multi-Objective Combinatorial Optimization with Conditional Computation

投稿日: 2025年6月11日作成者: jarxiv

要約

最近の深い強化学習方法は、それぞれが特定の重量ベクトルに関連付けられている複数のサブ問題に分解することにより、多目的組み合わせ最適化問題（MOCOPS）を解決することで顕著な成功を収めています。
ただし、これらの方法は通常、すべてのサブ問題を均等に扱い、単一のモデルを使用してそれらを解決し、ソリューション空間の効果的な調査を妨げ、したがって最適ではないパフォーマンスにつながります。
制限を克服するために、サブ問題のモデル構造の適応的選択を可能にする新しいプラグアンドプレイフレームワークであるPoccoを提案します。
具体的には、サブ問題を特殊なニューラルアーキテクチャにルーティングする条件付き計算ブロックを設計します。
さらに、勝利と紛失のソリューションの間でペアワイズの好みを学習する優先駆動型の最適化アルゴリズムを提案します。
Poccoの有効性と汎用性を、Mocopsの2つの最先端のニューラル方法に適用することにより評価します。
4つの古典的なMOCOPベンチマークにわたる実験結果は、その重要な優位性と強力な一般化を示しています。

要約(オリジナル)

Recent deep reinforcement learning methods have achieved remarkable success in solving multi-objective combinatorial optimization problems (MOCOPs) by decomposing them into multiple subproblems, each associated with a specific weight vector. However, these methods typically treat all subproblems equally and solve them using a single model, hindering the effective exploration of the solution space and thus leading to suboptimal performance. To overcome the limitation, we propose POCCO, a novel plug-and-play framework that enables adaptive selection of model structures for subproblems, which are subsequently optimized based on preference signals rather than explicit reward values. Specifically, we design a conditional computation block that routes subproblems to specialized neural architectures. Moreover, we propose a preference-driven optimization algorithm that learns pairwise preferences between winning and losing solutions. We evaluate the efficacy and versatility of POCCO by applying it to two state-of-the-art neural methods for MOCOPs. Experimental results across four classic MOCOP benchmarks demonstrate its significant superiority and strong generalization.

arxiv情報

著者	Mingfeng Fan,Jianan Zhou,Yifeng Zhang,Yaoxin Wu,Jinbiao Chen,Guillaume Adrien Sartoretti
発行日	2025-06-10 15:25:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI | コメントを受け付けていません

From Legal Texts to Defeasible Deontic Logic via LLMs: A Study in Automated Semantic Analysis

投稿日: 2025年6月11日作成者: jarxiv

要約

大規模な言語モデル（LLM）を使用して、法的テキストの自動セマンティック分析に対する新しいアプローチを提示し、それらの変換を不可能なデオンティックロジック（DDL）の正式な表現にターゲットにしています。
複雑な規範的言語を原子スニペットにセグメント化し、デオンティックルールを抽出し、構文とセマンティックの一貫性を評価する構造化されたパイプラインを提案します。
当社の方法論は、オーストラリアの通信消費者保護コードの法的規範に焦点を当てた、迅速なエンジニアリング戦略、微調整モデル、マルチステージパイプラインなど、さまざまなLLM構成にわたって評価されます。
経験的な結果は、機械で生成された形式化と専門家が作成する形式化の間の有望なアラインメントを示しており、LLMは特に効果的に促された場合、スケーラブルな法的情報学に大きく貢献できることを示しています。

要約(オリジナル)

We present a novel approach to the automated semantic analysis of legal texts using large language models (LLMs), targeting their transformation into formal representations in Defeasible Deontic Logic (DDL). We propose a structured pipeline that segments complex normative language into atomic snippets, extracts deontic rules, and evaluates them for syntactic and semantic coherence. Our methodology is evaluated across various LLM configurations, including prompt engineering strategies, fine-tuned models, and multi-stage pipelines, focusing on legal norms from the Australian Telecommunications Consumer Protections Code. Empirical results demonstrate promising alignment between machine-generated and expert-crafted formalizations, showing that LLMs – particularly when prompted effectively – can significantly contribute to scalable legal informatics.

arxiv情報

著者	Elias Horner,Cristinel Mateis,Guido Governatori,Agata Ciabattoni
発行日	2025-06-10 15:25:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.CY, cs.LO | コメントを受け付けていません

Intention-Conditioned Flow Occupancy Models

投稿日: 2025年6月11日作成者: jarxiv

要約

大規模な事前トレーニングにより、機械学習の研究が今日どのように行われるかが根本的に変化しました。大規模な基礎モデルは一度トレーニングされ、コミュニティの誰でも（データやモデルをゼロから訓練するためのリソースを計算しないリソースを含む）、特定のタスクに適応して微調整することができます。
この同じフレームワークを強化学習（RL）に適用することは、サンプルの効率と堅牢性など、RLのコアチャレンジに対処するための説得力のある手段を提供するため、魅力的です。
ただし、RLのコンテキストで大規模なモデルを事前に退行することには根本的な課題が残っています。アクションには長期的な依存関係があるため、時間をかけて理由が重要な基盤モデルをトレーニングすることが重要です。
生成AIの最近の進歩により、非常に複雑な分布をモデル化するための新しいツールが提供されています。
この論文では、フローマッチングを使用して、エージェントが一時的に遠い将来（つまり、占有尺度）にどの州が訪問するかを予測する確率的モデルを構築します。
多くの場合、大きなデータセットは、個別のタスクを実行する多くの異なるユーザーによって構築されるため、ユーザーの意図をキャプチャする潜在変数をモデルに含めます。
この意図は、モデルの表現力を高め、一般化された政策改善による適応を可能にします。
提案された方法で意図条件付きフロー占有モデル（INFOM）を呼び出します。
トレーニング前の代替方法と比較すると、36ドルの州ベースと4ドルの画像ベースのベンチマークタスクに関する実験は、提案された方法が1.8 \ Times $のリターンの改善を達成し、成功率を36 \％$に引き上げることを示しています。
ウェブサイト：https：//chongyi-zheng.github.io/infomコード：https：//github.com/chongyi-zheng/infom

要約(オリジナル)

Large-scale pre-training has fundamentally changed how machine learning research is done today: large foundation models are trained once, and then can be used by anyone in the community (including those without data or compute resources to train a model from scratch) to adapt and fine-tune to specific tasks. Applying this same framework to reinforcement learning (RL) is appealing because it offers compelling avenues for addressing core challenges in RL, including sample efficiency and robustness. However, there remains a fundamental challenge to pre-train large models in the context of RL: actions have long-term dependencies, so training a foundation model that reasons across time is important. Recent advances in generative AI have provided new tools for modeling highly complex distributions. In this paper, we build a probabilistic model to predict which states an agent will visit in the temporally distant future (i.e., an occupancy measure) using flow matching. As large datasets are often constructed by many distinct users performing distinct tasks, we include in our model a latent variable capturing the user intention. This intention increases the expressivity of our model, and enables adaptation with generalized policy improvement. We call our proposed method intention-conditioned flow occupancy models (InFOM). Comparing with alternative methods for pre-training, our experiments on $36$ state-based and $4$ image-based benchmark tasks demonstrate that the proposed method achieves $1.8 \times$ median improvement in returns and increases success rates by $36\%$. Website: https://chongyi-zheng.github.io/infom Code: https://github.com/chongyi-zheng/infom

arxiv情報

著者	Chongyi Zheng,Seohong Park,Sergey Levine,Benjamin Eysenbach
発行日	2025-06-10 15:27:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント