jarxiv | Japanese arxiv | ページ 943

Offline Reinforcement Learning using Human-Aligned Reward Labeling for Autonomous Emergency Braking in Occluded Pedestrian Crossing

投稿日: 2025年4月14日作成者: jarxiv

要約

現実世界の駆動データセットの効果的なレバレッジは、自律運転システムのトレーニングを強化するために重要です。
オフラインの強化学習により、そのようなデータを使用して自動運転車のトレーニングが可能になりますが、利用可能なデータセットには意味のある報酬ラベルがありません。
報酬のラベル付けは、学習アルゴリズムのフィードバックを提供して、望ましい動作と望ましくない動作を区別し、それによってポリシーのパフォーマンスを改善するため、不可欠です。
このペーパーでは、人間に並べられた報酬ラベルを生成するための新しいパイプラインを紹介します。
提案されたアプローチは、人間の判断と安全性の考慮事項を反映するラベルを生成することにより、実際のデータセットに不在の報酬信号の課題に対処します。
パイプラインには、セマンティックセグメンテーションマップを分析することによりアクティブ化された適応型安全コンポーネントが組み込まれており、潜在的な衝突シナリオの効率性よりも自動運転車が優先順位を付けることができます。
提案されたパイプラインは、合成データとシミュレーションデータを使用して、さまざまなレベルの歩行者交通を備えた閉塞された歩行者交差シナリオに適用されます。
結果は、生成された報酬ラベルがシミュレーション報酬ラベルと密接に一致することを示しています。
行動の近位政策最適化を使用して運転ポリシーを訓練するために使用される場合、結果は他のベースラインと競合します。
これは、信頼性の高い人間に合った報酬信号を生成する際の私たちの方法の有効性を示しており、シミュレーション環境以外の強化学習を通じて自律的な駆動システムのトレーニングを促進し、人間の価値と整合しています。

要約(オリジナル)

Effective leveraging of real-world driving datasets is crucial for enhancing the training of autonomous driving systems. While Offline Reinforcement Learning enables the training of autonomous vehicles using such data, most available datasets lack meaningful reward labels. Reward labeling is essential as it provides feedback for the learning algorithm to distinguish between desirable and undesirable behaviors, thereby improving policy performance. This paper presents a novel pipeline for generating human-aligned reward labels. The proposed approach addresses the challenge of absent reward signals in real-world datasets by generating labels that reflect human judgment and safety considerations. The pipeline incorporates an adaptive safety component, activated by analyzing semantic segmentation maps, allowing the autonomous vehicle to prioritize safety over efficiency in potential collision scenarios. The proposed pipeline is applied to an occluded pedestrian crossing scenario with varying levels of pedestrian traffic, using synthetic and simulation data. The results indicate that the generated reward labels closely match the simulation reward labels. When used to train the driving policy using Behavior Proximal Policy Optimisation, the results are competitive with other baselines. This demonstrates the effectiveness of our method in producing reliable and human-aligned reward signals, facilitating the training of autonomous driving systems through Reinforcement Learning outside of simulation environments and in alignment with human values.

arxiv情報

著者	Vinal Asodia,Zhenhua Feng,Saber Fallah
発行日	2025-04-11 17:11:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, cs.RO | コメントを受け付けていません

Beyond Black-Box Predictions: Identifying Marginal Feature Effects in Tabular Transformer Networks

投稿日: 2025年4月14日作成者: jarxiv

要約

近年、深いニューラルネットワークは、さまざまなタスクにわたって予測力を示しています。
自然言語処理を超えて、トランスアーキテクチャは表形式のデータの問題に対処するのに効率的であることが証明されており、これらの分野の以前に支配的な勾配ベースの決定ツリーに挑戦しています。
ただし、この予測力はわかりやすさの犠牲を払ってもたらされます。周辺の特徴効果は、深いTlulalトランスネットワークのブラックボックスの性質でほぼ完全に失われます。
古典的な統計回帰モデルの添加剤制約を使用する代替アーキテクチャは、わかりやすい周辺の特徴効果を維持できますが、より複雑な対応物と比較して予測力が不足していることがよくあります。
明瞭度とパフォーマンスの間のギャップを埋めるために、周辺の特徴効果を識別するために設計された表形式トランスネットワークの適応を提案します。
わずかな特徴効果を正確に特定できるという理論的正当性を提供し、私たちのアブレーション研究は、提案されたモデルが複雑な特徴の相互作用の中でさえ、これらの効果を効率的に検出することを実証しています。
モデルの予測機能を実証するために、それをブラックボックスモデルと同様にいくつかの解釈可能なモデルと比較し、明瞭度を維持しながらブラックボックスのパフォーマンスに一致する可能性があることがわかります。
ソースコードは、https：//github.com/opentabular/nampyで入手できます。

要約(オリジナル)

In recent years, deep neural networks have showcased their predictive power across a variety of tasks. Beyond natural language processing, the transformer architecture has proven efficient in addressing tabular data problems and challenges the previously dominant gradient-based decision trees in these areas. However, this predictive power comes at the cost of intelligibility: Marginal feature effects are almost completely lost in the black-box nature of deep tabular transformer networks. Alternative architectures that use the additivity constraints of classical statistical regression models can maintain intelligible marginal feature effects, but often fall short in predictive power compared to their more complex counterparts. To bridge the gap between intelligibility and performance, we propose an adaptation of tabular transformer networks designed to identify marginal feature effects. We provide theoretical justifications that marginal feature effects can be accurately identified, and our ablation study demonstrates that the proposed model efficiently detects these effects, even amidst complex feature interactions. To demonstrate the model’s predictive capabilities, we compare it to several interpretable as well as black-box models and find that it can match black-box performances while maintaining intelligibility. The source code is available at https://github.com/OpenTabular/NAMpy.

arxiv情報

著者	Anton Thielmann,Arik Reuter,Benjamin Saefken
発行日	2025-04-11 17:23:09+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, stat.ML | コメントを受け付けていません

Surrogate-based optimization of system architectures subject to hidden constraints

投稿日: 2025年4月14日作成者: jarxiv

要約

新しいアーキテクチャの調査には、事前の経験が不足しているため、物理学ベースのシミュレーションが必要です。これは、最適化アルゴリズムの2つの特定の課題を導入します。評価はより高価になり（時間内に）失敗する可能性があります。
前者の課題は、代理ベースの最適化（SBO）アルゴリズム、特にガウスプロセス（GP）モデルを使用したベイズ最適化（BO）によって対処されます。
BOが設計変数階層や複数の目的などのアーキテクチャの最適化に固有の課題に対処する方法の概要を説明しています。特定の測定には、アンサンブルフィルと階層サンプリングアルゴリズムが含まれます。
基礎となるソルバーの不変性や、設計スペースの特定の領域での実行不可能なジオメトリのために評価が失敗する可能性があります。
隠された制約としても知られているこのような失敗した評価は、サロゲートモデルを空の結果でトレーニングできないため、SBO/BOに特定の課題をもたらします。
この作業は、BOアルゴリズムの隠された制約を満たすためのさまざまな戦略を調査しています。
3つの高レベルの戦略が特定されています。トレーニングセットから失敗したポイントの拒否、実行可能な（非故障）ポイントに基づいて失敗したポイントを置き換え、失敗領域の予測です。
ジェットエンジンアーキテクチャの最適化問題を含む一連のテスト問題に関する調査を通じて、生存率（POV）の確率を予測するために、混合ちらつきGPで最高のパフォーマンスが達成され、選択されたインフィルポイントが最小POVのしきい値を満たすことを保証することが示されています。
この戦略は、50％の故障率で機能し、以前はBOアルゴリズムでは解決できなかったジェットエンジンアーキテクチャの問題を解決することによって実証されています。
開発されたBOアルゴリズムと使用されたテストの問題は、オープンソースPythonライブラリSbarchoptで利用できます。

要約(オリジナル)

The exploration of novel architectures requires physics-based simulation due to a lack of prior experience to start from, which introduces two specific challenges for optimization algorithms: evaluations become more expensive (in time) and evaluations might fail. The former challenge is addressed by Surrogate-Based Optimization (SBO) algorithms, in particular Bayesian Optimization (BO) using Gaussian Process (GP) models. An overview is provided of how BO can deal with challenges specific to architecture optimization, such as design variable hierarchy and multiple objectives: specific measures include ensemble infills and a hierarchical sampling algorithm. Evaluations might fail due to non-convergence of underlying solvers or infeasible geometry in certain areas of the design space. Such failed evaluations, also known as hidden constraints, pose a particular challenge to SBO/BO, as the surrogate model cannot be trained on empty results. This work investigates various strategies for satisfying hidden constraints in BO algorithms. Three high-level strategies are identified: rejection of failed points from the training set, replacing failed points based on viable (non-failed) points, and predicting the failure region. Through investigations on a set of test problems including a jet engine architecture optimization problem, it is shown that best performance is achieved with a mixed-discrete GP to predict the Probability of Viability (PoV), and by ensuring selected infill points satisfy some minimum PoV threshold. This strategy is demonstrated by solving a jet engine architecture problem that features at 50% failure rate and could not previously be solved by a BO algorithm. The developed BO algorithm and used test problems are available in the open-source Python library SBArchOpt.

arxiv情報

著者	Jasper Bussemaker,Paul Saves,Nathalie Bartoli,Thierry Lefebvre,Björn Nagel
発行日	2025-04-11 17:35:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, math.OC, stat.ML | コメントを受け付けていません

Dimension reduction for derivative-informed operator learning: An analysis of approximation errors

投稿日: 2025年4月14日作成者: jarxiv

要約

私たちは、ニューラルネットワークによる無限次元分離可能なヒルベルト空間間の非線形演算子の誘導体に基づいた学習を研究しています。
このような演算子は、部分的な微分方程式（PDE）の解から生じる可能性があり、PDEが制約した最適化、ベイズの逆問題、最適な実験設計など、科学と工学の多くのシミュレーションベースの外ループタスクで使用されます。
これらの設定では、ニューラルネットワーク近似を代理モデルとして使用して、外側ループタスクの解を加速できます。
ただし、無限の寸法の外側ループタスクには、基礎となるジオメトリの知識が必要なことが多いため、演算子のデリバティブの近似精度も代理モデルのパフォーマンスに大きな影響を与える可能性があります。
これにより動機付けられて、無限のガウス入力測定に対するソボレフ基準における神経演算子の近似誤差を分析します。
Ortonormalベースの縮小セットに及ぶ支配的な入出力サブスペースで定義された線形エンコーダとデコーダーを使用する、還元ベースのニューラル演算子（RBNO）に焦点を当てます。
この目的のために、ベースを生成するための2つの方法を研究します。
主成分分析（PCA）およびデリバティブに情報に基づいたサブスペース（DIS）は、それぞれ削減塩基としてデータまたはデリバティブの共分散の支配的な固有ベクトルを使用します。
次に、PCA/DISの経験的推定に関連するサンプリングエラーを含む、次元減少と潜在ニューラルネットワーク近似の両方から生じるエラーの境界を導き出します。
私たちの分析は、楕円PDEの数値実験で検証されています。そこでは、結果がマップ（すなわち、DISまたは出力PCA）によって通知される塩基が正確な再構成とオペレーターとその導関数の両方の一般化エラーを生成し、ランクとトレーニングサンプルサイズが十分に大きい場合を除き、入力PCAが十分に大きい場合を除き、入力PCAが不十分であることが示されています。

要約(オリジナル)

We study the derivative-informed learning of nonlinear operators between infinite-dimensional separable Hilbert spaces by neural networks. Such operators can arise from the solution of partial differential equations (PDEs), and are used in many simulation-based outer-loop tasks in science and engineering, such as PDE-constrained optimization, Bayesian inverse problems, and optimal experimental design. In these settings, the neural network approximations can be used as surrogate models to accelerate the solution of the outer-loop tasks. However, since outer-loop tasks in infinite dimensions often require knowledge of the underlying geometry, the approximation accuracy of the operator’s derivatives can also significantly impact the performance of the surrogate model. Motivated by this, we analyze the approximation errors of neural operators in Sobolev norms over infinite-dimensional Gaussian input measures. We focus on the reduced basis neural operator (RBNO), which uses linear encoders and decoders defined on dominant input/output subspaces spanned by reduced sets of orthonormal bases. To this end, we study two methods for generating the bases; principal component analysis (PCA) and derivative-informed subspaces (DIS), which use the dominant eigenvectors of the covariance of the data or the derivatives as the reduced bases, respectively. We then derive bounds for errors arising from both the dimension reduction and the latent neural network approximation, including the sampling errors associated with the empirical estimation of the PCA/DIS. Our analysis is validated on numerical experiments with elliptic PDEs, where our results show that bases informed by the map (i.e., DIS or output PCA) yield accurate reconstructions and generalization errors for both the operator and its derivatives, while input PCA may underperform unless ranks and training sample sizes are sufficiently large.

arxiv情報

著者	Dingcheng Luo,Thomas O’Leary-Roseberry,Peng Chen,Omar Ghattas
発行日	2025-04-11 17:56:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, cs.NA, math.NA | コメントを受け付けていません

Out of Style: RAG’s Fragility to Linguistic Variation

投稿日: 2025年4月14日作成者: jarxiv

要約

さまざまなNLPベンチマークにわたる検索された生成（RAG）システムの印象的なパフォーマンスにもかかわらず、実際のユーザー-LLMインタラクションクエリの処理における堅牢性はほとんど未定です。
これは、ユーザークエリがより大きな言語的変動を示し、相互依存のRAGコンポーネント全体でカスケードエラーを引き起こす可能性がある実際の展開の重要なギャップを示します。
この作業では、4つの言語次元（形式、読みやすさ、礼儀正しさ、文法的正しさ）がぼろきれのパフォーマンスにどのように変化するかを体系的に分析します。
2つの検索モデルと9つのLLMを評価します。これは、3〜720億のパラメーターの範囲で、4つの情報を求める質問応答（QA）データセットにまたがります。
我々の結果は、言語の再定式化が検索段階と生成段階の両方に大きく影響し、少ない正式なクエリではリコール@5スコアで最大40.41％、文法エラーを含むクエリの回答マッチスコアで38.86％になることを明らかにしています。
特に、RAGシステムは、LLMのみの世代と比較して、このような変動に対してより大きな感度を示し、言語シフトによるエラー伝播に対する脆弱性を強調しています。
これらの調査結果は、多様なユーザーインタラクションの信頼性を高めるための堅牢性技術の改善の必要性を強調しています。

要約(オリジナル)

Despite the impressive performance of Retrieval-augmented Generation (RAG) systems across various NLP benchmarks, their robustness in handling real-world user-LLM interaction queries remains largely underexplored. This presents a critical gap for practical deployment, where user queries exhibit greater linguistic variations and can trigger cascading errors across interdependent RAG components. In this work, we systematically analyze how varying four linguistic dimensions (formality, readability, politeness, and grammatical correctness) impact RAG performance. We evaluate two retrieval models and nine LLMs, ranging from 3 to 72 billion parameters, across four information-seeking Question Answering (QA) datasets. Our results reveal that linguistic reformulations significantly impact both retrieval and generation stages, leading to a relative performance drop of up to 40.41% in Recall@5 scores for less formal queries and 38.86% in answer match scores for queries containing grammatical errors. Notably, RAG systems exhibit greater sensitivity to such variations compared to LLM-only generations, highlighting their vulnerability to error propagation due to linguistic shifts. These findings highlight the need for improved robustness techniques to enhance reliability in diverse user interactions.

arxiv情報

著者	Tianyu Cao,Neel Bhandari,Akhila Yerukola,Akari Asai,Maarten Sap
発行日	2025-04-11 03:30:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL | コメントを受け付けていません

Humanity’s Last Exam

投稿日: 2025年4月14日作成者: jarxiv

要約

ベンチマークは、大規模な言語モデル（LLM）機能の急速な進歩を追跡するための重要なツールです。
ただし、ベンチマークはPaceを困難に保ちません。LLMは、MMLUなどの一般的なベンチマークで90を超える精度を達成し、最先端のLLM機能の情報に基づいた測定値を制限しています。
これに応じて、人間の知識のフロンティアでのマルチモーダルベンチマークである人類の最後の試験（HLE）を紹介します。
HLEは、数学、人文科学、自然科学など、数十の科目にわたる2,700の質問で構成されています。
HLEは、主題の専門家によってグローバルに開発されており、自動化されたグレーディングに適した複数選択および短い回答の質問で構成されています。
それぞれの質問には、明確で簡単に検証できますが、インターネット検索では迅速に回答することはできません。
最先端のLLMは、HLEでの精度とキャリブレーションが低いことを示し、現在のLLM機能と閉鎖された学術的な質問に関する専門家の人間のフロンティアとの間に大きなギャップを強調しています。
モデル機能を明確に理解して研究と政策立案を通知するために、https：//lastexam.aiでHLEを公開しています。

要約(オリジナル)

Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity’s Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,700 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.

arxiv情報

著者	Long Phan,Alice Gatti,Ziwen Han,Nathaniel Li,Josephina Hu,Hugh Zhang,Chen Bo Calvin Zhang,Mohamed Shaaban,John Ling,Sean Shi,Michael Choi,Anish Agrawal,Arnav Chopra,Adam Khoja,Ryan Kim,Richard Ren,Jason Hausenloy,Oliver Zhang,Mantas Mazeika,Dmitry Dodonov,Tung Nguyen,Jaeho Lee,Daron Anderson,Mikhail Doroshenko,Alun Cennyth Stokes,Mobeen Mahmood,Oleksandr Pokutnyi,Oleg Iskra,Jessica P. Wang,John-Clark Levin,Mstyslav Kazakov,Fiona Feng,Steven Y. Feng,Haoran Zhao,Michael Yu,Varun Gangal,Chelsea Zou,Zihan Wang,Serguei Popov,Robert Gerbicz,Geoff Galgon,Johannes Schmitt,Will Yeadon,Yongki Lee,Scott Sauers,Alvaro Sanchez,Fabian Giska,Marc Roth,Søren Riis,Saiteja Utpala,Noah Burns,Gashaw M. Goshu,Mohinder Maheshbhai Naiya,Chidozie Agu,Zachary Giboney,Antrell Cheatom,Francesco Fournier-Facio,Sarah-Jane Crowson,Lennart Finke,Zerui Cheng,Jennifer Zampese,Ryan G. Hoerr,Mark Nandor,Hyunwoo Park,Tim Gehrunger,Jiaqi Cai,Ben McCarty,Alexis C Garretson,Edwin Taylor,Damien Sileo,Qiuyu Ren,Usman Qazi,Lianghui Li,Jungbae Nam,John B. Wydallis,Pavel Arkhipov,Jack Wei Lun Shi,Aras Bacho,Chris G. Willcocks,Hangrui Cao,Sumeet Motwani,Emily de Oliveira Santos,Johannes Veith,Edward Vendrow,Doru Cojoc,Kengo Zenitani,Joshua Robinson,Longke Tang,Yuqi Li,Joshua Vendrow,Natanael Wildner Fraga,Vladyslav Kuchkin,Andrey Pupasov Maksimov,Pierre Marion,Denis Efremov,Jayson Lynch,Kaiqu Liang,Aleksandar Mikov,Andrew Gritsevskiy,Julien Guillod,Gözdenur Demir,Dakotah Martinez,Ben Pageler,Kevin Zhou,Saeed Soori,Ori Press,Henry Tang,Paolo Rissone,Sean R. Green,Lina Brüssel,Moon Twayana,Aymeric Dieuleveut,Joseph Marvin Imperial,Ameya Prabhu,Jinzhou Yang,Nick Crispino,Arun Rao,Dimitri Zvonkine,Gabriel Loiseau,Mikhail Kalinin,Marco Lukas,Ciprian Manolescu,Nate Stambaugh,Subrata Mishra,Tad Hogg,Carlo Bosio,Brian P Coppola,Julian Salazar,Jaehyeok Jin,Rafael Sayous,Stefan Ivanov,Philippe Schwaller,Shaipranesh Senthilkuma,Andres M Bran,Andres Algaba,Kelsey Van den Houte,Lynn Van Der Sypt,Brecht Verbeken,David Noever,Alexei Kopylov,Benjamin Myklebust,Bikun Li,Lisa Schut,Evgenii Zheltonozhskii,Qiaochu Yuan,Derek Lim,Richard Stanley,Tong Yang,John Maar,Julian Wykowski,Martí Oller,Anmol Sahu,Cesare Giulio Ardito,Yuzheng Hu,Ariel Ghislain Kemogne Kamdoum,Alvin Jin,Tobias Garcia Vilchis,Yuexuan Zu,Martin Lackner,James Koppel,Gongbo Sun,Daniil S. Antonenko,Steffi Chern,Bingchen Zhao,Pierrot Arsene,Joseph M Cavanagh,Daofeng Li,Jiawei Shen,Donato Crisostomi,Wenjin Zhang,Ali Dehghan,Sergey Ivanov,David Perrella,Nurdin Kaparov,Allen Zang,Ilia Sucholutsky,Arina Kharlamova,Daniil Orel,Vladislav Poritski,Shalev Ben-David,Zachary Berger,Parker Whitfill,Michael Foster,Daniel Munro,Linh Ho,Shankar Sivarajan,Dan Bar Hava,Aleksey Kuchkin,David Holmes,Alexandra Rodriguez-Romero,Frank Sommerhage,Anji Zhang,Richard Moat,Keith Schneider,Zakayo Kazibwe,Don Clarke,Dae Hyun Kim,Felipe Meneguitti Dias,Sara Fish,Veit Elser,Tobias Kreiman,Victor Efren Guadarrama Vilchis,Immo Klose,Ujjwala Anantheswaran,Adam Zweiger,Kaivalya Rawal,Jeffery Li,Jeremy Nguyen,Nicolas Daans,Haline Heidinger,Maksim Radionov,Václav Rozhoň,Vincent Ginis,Christian Stump,Niv Cohen,Rafał Poświata,Josef Tkadlec,Alan Goldfarb,Chenguang Wang,Piotr Padlewski,Stanislaw Barzowski,Kyle Montgomery,Ryan Stendall,Jamie Tucker-Foltz,Jack Stade,T. Ryan Rogers,Tom Goertzen,Declan Grabb,Abhishek Shukla,Alan Givré,John Arnold Ambay,Archan Sen,Muhammad Fayez Aziz,Mark H Inlow,Hao He,Ling Zhang,Younesse Kaddar,Ivar Ängquist,Yanxu Chen,Harrison K Wang,Kalyan Ramakrishnan,Elliott Thornley,Antonio Terpin,Hailey Schoelkopf,Eric Zheng,Avishy Carmi,Ethan D. L. Brown,Kelin Zhu,Max Bartolo,Richard Wheeler,Martin Stehberger,Peter Bradshaw,JP Heimonen,Kaustubh Sridhar,Ido Akov,Jennifer Sandlin,Yury Makarychev,Joanna Tam,Hieu Hoang,David M. Cunningham,Vladimir Goryachev,Demosthenes Patramanis,Michael Krause,Andrew Redenti,David Aldous,Jesyin Lai,Shannon Coleman,Jiangnan Xu,Sangwon Lee,Ilias Magoulas,Sandy Zhao,Ning Tang,Michael K. Cohen,Orr Paradise,Jan Hendrik Kirchner,Maksym Ovchynnikov,Jason O. Matos,Adithya Shenoy,Michael Wang,Yuzhou Nie,Anna Sztyber-Betley,Paolo Faraboschi,Robin Riblet,Jonathan Crozier,Shiv Halasyamani,Shreyas Verma,Prashant Joshi,Eli Meril,Ziqiao Ma,Jérémy Andréoletti,Raghav Singhal,Jacob Platnick,Volodymyr Nevirkovets,Luke Basler,Alexander Ivanov,Seri Khoury,Nils Gustafsson,Marco Piccardo,Hamid Mostaghimi,Qijia Chen,Virendra Singh,Tran Quoc Khánh,Paul Rosu,Hannah Szlyk,Zachary Brown,Himanshu Narayan,Aline Menezes,Jonathan Roberts,William Alley,Kunyang Sun,Arkil Patel,Max Lamparth,Anka Reuel,Linwei Xin,Hanmeng Xu,Jacob Loader,Freddie Martin,Zixuan Wang,Andrea Achilleos,Thomas Preu,Tomek Korbak,Ida Bosio,Fereshteh Kazemi,Ziye Chen,Biró Bálint,Eve J. Y. Lo,Jiaqi Wang,Maria Inês S. Nunes,Jeremiah Milbauer,M Saiful Bari,Zihao Wang,Behzad Ansarinejad,Yewen Sun,Stephane Durand,Hossam Elgnainy,Guillaume Douville,Daniel Tordera,George Balabanian,Hew Wolff,Lynna Kvistad,Hsiaoyun Milliron,Ahmad Sakor,Murat Eron,Andrew Favre D. O.,Shailesh Shah,Xiaoxiang Zhou,Firuz Kamalov,Sherwin Abdoli,Tim Santens,Shaul Barkan,Allison Tee,Robin Zhang,Alessandro Tomasiello,G. Bruno De Luca,Shi-Zhuo Looi,Vinh-Kha Le,Noam Kolt,Jiayi Pan,Emma Rodman,Jacob Drori,Carl J Fossum,Niklas Muennighoff,Milind Jagota,Ronak Pradeep,Honglu Fan,Jonathan Eicher,Michael Chen,Kushal Thaman,William Merrill,Moritz Firsching,Carter Harris,Stefan Ciobâcă,Jason Gross,Rohan Pandey,Ilya Gusev,Adam Jones,Shashank Agnihotri,Pavel Zhelnov,Mohammadreza Mofayezi,Alexander Piperski,David K. Zhang,Kostiantyn Dobarskyi,Roman Leventov,Ignat Soroko,Joshua Duersch,Vage Taamazyan,Andrew Ho,Wenjie Ma,William Held,Ruicheng Xian,Armel Randy Zebaze,Mohanad Mohamed,Julian Noah Leser,Michelle X Yuan,Laila Yacar,Johannes Lengler,Katarzyna Olszewska,Claudio Di Fratta,Edson Oliveira,Joseph W. Jackson,Andy Zou,Muthu Chidambaram,Timothy Manik,Hector Haffenden,Dashiell Stander,Ali Dasouqi,Alexander Shen,Bita Golshani,David Stap,Egor Kretov,Mikalai Uzhou,Alina Borisovna Zhidkovskaya,Nick Winter,Miguel Orbegozo Rodriguez,Robert Lauff,Dustin Wehr,Colin Tang,Zaki Hossain,Shaun Phillips,Fortuna Samuele,Fredrik Ekström,Angela Hammon,Oam Patel,Faraz Farhidi,George Medley,Forough Mohammadzadeh,Madellene Peñaflor,Haile Kassahun,Alena Friedrich,Rayner Hernandez Perez,Daniel Pyda,Taom Sakal,Omkar Dhamane,Ali Khajegili Mirabadi,Eric Hallman,Kenchi Okutsu,Mike Battaglia,Mohammad Maghsoudimehrabani,Alon Amit,Dave Hulbert,Roberto Pereira,Simon Weber,Handoko,Anton Peristyy,Stephen Malina,Mustafa Mehkary,Rami Aly,Frank Reidegeld,Anna-Katharina Dick,Cary Friday,Mukhwinder Singh,Hassan Shapourian,Wanyoung Kim,Mariana Costa,Hubeyb Gurdogan,Harsh Kumar,Chiara Ceconello,Chao Zhuang,Haon Park,Micah Carroll,Andrew R. Tawfeek,Stefan Steinerberger,Daattavya Aggarwal,Michael Kirchhof,Linjie Dai,Evan Kim,Johan Ferret,Jainam Shah,Yuzhou Wang,Minghao Yan,Krzysztof Burdzy,Lixin Zhang,Antonio Franca,Diana T. Pham,Kang Yong Loh,Joshua Robinson,Abram Jackson,Paolo Giordano,Philipp Petersen,Adrian Cosma,Jesus Colino,Colin White,Jacob Votava,Vladimir Vinnikov,Ethan Delaney,Petr Spelda,Vit Stritecky,Syed M. Shahid,Jean-Christophe Mourrat,Lavr Vetoshkin,Koen Sponselee,Renas Bacho,Zheng-Xin Yong,Florencia de la Rosa,Nathan Cho,Xiuyu Li,Guillaume Malod,Orion Weller,Guglielmo Albani,Leon Lang,Julien Laurendeau,Dmitry Kazakov,Fatimah Adesanya,Julien Portier,Lawrence Hollom,Victor Souza,Yuchen Anna Zhou,Julien Degorre,Yiğit Yalın,Gbenga Daniel Obikoya,Rai,Filippo Bigi,M. C. Boscá,Oleg Shumar,Kaniuar Bacho,Gabriel Recchia,Mara Popescu,Nikita Shulga,Ngefor Mildred Tanwie,Thomas C. H. Lux,Ben Rank,Colin Ni,Matthew Brooks,Alesia Yakimchyk,Huanxu,Liu,Stefano Cavalleri,Olle Häggström,Emil Verkama,Joshua Newbould,Hans Gundlach,Leonor Brito-Santana,Brian Amaro,Vivek Vajipey,Rynaa Grover,Ting Wang,Yosi Kratish,Wen-Ding Li,Sivakanth Gopi,Andrea Caciolai,Christian Schroeder de Witt,Pablo Hernández-Cámara,Emanuele Rodolà,Jules Robins,Dominic Williamson,Vincent Cheng,Brad Raynor,Hao Qi,Ben Segev,Jingxuan Fan,Sarah Martinson,Erik Y. Wang,Kaylie Hausknecht,Michael P. Brenner,Mao Mao,Christoph Demian,Peyman Kassani,Xinyu Zhang,David Avagian,Eshawn Jessica Scipio,Alon Ragoler,Justin Tan,Blake Sims,Rebeka Plecnik,Aaron Kirtland,Omer Faruk Bodur,D. P. Shinde,Yan Carlos Leyva Labrador,Zahra Adoul,Mohamed Zekry,Ali Karakoc,Tania C. B. Santos,Samir Shamseldeen,Loukmane Karim,Anna Liakhovitskaia,Nate Resman,Nicholas Farina,Juan Carlos Gonzalez,Gabe Maayan,Earth Anderson,Rodrigo De Oliveira Pena,Elizabeth Kelley,Hodjat Mariji,Rasoul Pouriamanesh,Wentao Wu,Ross Finocchio,Ismail Alarab,Joshua Cole,Danyelle Ferreira,Bryan Johnson,Mohammad Safdari,Liangti Dai,Siriphan Arthornthurasuk,Isaac C. McAlister,Alejandro José Moyano,Alexey Pronin,Jing Fan,Angel Ramirez-Trinidad,Yana Malysheva,Daphiny Pottmaier,Omid Taheri,Stanley Stepanic,Samuel Perry,Luke Askew,Raúl Adrián Huerta Rodríguez,Ali M. R. Minissi,Ricardo Lorena,Krishnamurthy Iyer,Arshad Anil Fasiludeen,Ronald Clark,Josh Ducey,Matheus Piza,Maja Somrak,Eric Vergo,Juehang Qin,Benjámin Borbás,Eric Chu,Jack Lindsey,Antoine Jallon,I. M. J. McInnis,Evan Chen,Avi Semler,Luk Gloor,Tej Shah,Marc Carauleanu,Pascal Lauer,Tran Đuc Huy,Hossein Shahrtash,Emilien Duc,Lukas Lewark,Assaf Brown,Samuel Albanie,Brian Weber,Warren S. Vaz,Pierre Clavier,Yiyang Fan,Gabriel Poesia Reis e Silva,Long,Lian,Marcus Abramovitch,Xi Jiang,Sandra Mendoza,Murat Islam,Juan Gonzalez,Vasilios Mavroudis,Justin Xu,Pawan Kumar,Laxman Prasad Goswami,Daniel Bugas,Nasser Heydari,Ferenc Jeanplong,Thorben Jansen,Antonella Pinto,Archimedes Apronti,Abdallah Galal,Ng Ze-An,Ankit Singh,Tong Jiang,Joan of Arc Xavier,Kanu Priya Agarwal,Mohammed Berkani,Gang Zhang,Zhehang Du,Benedito Alves de Oliveira Junior,Dmitry Malishev,Nicolas Remy,Taylor D. Hartman,Tim Tarver,Stephen Mensah,Gautier Abou Loume,Wiktor Morak,Farzad Habibi,Sarah Hoback,Will Cai,Javier Gimenez,Roselynn Grace Montecillo,Jakub Łucki,Russell Campbell,Asankhaya Sharma,Khalida Meer,Shreen Gul,Daniel Espinosa Gonzalez,Xavier Alapont,Alex Hoover,Gunjan Chhablani,Freddie Vargus,Arunim Agarwal,Yibo Jiang,Deepakkumar Patil,David Outevsky,Kevin Joseph Scaria,Rajat Maheshwari,Abdelkader Dendane,Priti Shukla,Ashley Cartwright,Sergei Bogdanov,Niels Mündler,Sören Möller,Luca Arnaboldi,Kunvar Thaman,Muhammad Rehan Siddiqi,Prajvi Saxena,Himanshu Gupta,Tony Fruhauff,Glen Sherman,Mátyás Vincze,Siranut Usawasutsakorn,Dylan Ler,Anil Radhakrishnan,Innocent Enyekwe,Sk Md Salauddin,Jiang Muzhen,Aleksandr Maksapetyan,Vivien Rossbach,Chris Harjadi,Mohsen Bahaloohoreh,Claire Sparrow,Jasdeep Sidhu,Sam Ali,Song Bian,John Lai,Eric Singer,Justine Leon Uro,Greg Bateman,Mohamed Sayed,Ahmed Menshawy,Darling Duclosel,Dario Bezzi,Yashaswini Jain,Ashley Aaron,Murat Tiryakioglu,Sheeshram Siddh,Keith Krenek,Imad Ali Shah,Jun Jin,Scott Creighton,Denis Peskoff,Zienab EL-Wasif,Ragavendran P V,Michael Richmond,Joseph McGowan,Tejal Patwardhan,Hao-Yu Sun,Ting Sun,Nikola Zubić,Samuele Sala,Stephen Ebert,Jean Kaddour,Manuel Schottdorf,Dianzhuo Wang,Gerol Petruzella,Alex Meiburg,Tilen Medved,Ali ElSheikh,S Ashwin Hebbar,Lorenzo Vaquero,Xianjun Yang,Jason Poulos,Vilém Zouhar,Sergey Bogdanik,Mingfang Zhang,Jorge Sanz-Ros,David Anugraha,Yinwei Dai,Anh N. Nhu,Xue Wang,Ali Anil Demircali,Zhibai Jia,Yuyin Zhou,Juncheng Wu,Mike He,Nitin Chandok,Aarush Sinha,Gaoxiang Luo,Long Le,Mickaël Noyé,Ioannis Pantidis,Tianbo Qi,Soham Sachin Purohit,Letitia Parcalabescu,Thai-Hoa Nguyen,Genta Indra Winata,Edoardo M. Ponti,Hanchen Li,Kaustubh Dhole,Jongee Park,Dario Abbondanza,Yuanli Wang,Anupam Nayak,Diogo M. Caetano,Antonio A. W. L. Wong,Maria del Rio-Chanona,Dániel Kondor,Pieter Francois,Ed Chalstrey,Jakob Zsambok,Dan Hoyer,Jenny Reddish,Jakob Hauser,Francisco-Javier Rodrigo-Ginés,Suchandra Datta,Maxwell Shepherd,Thom Kamphuis,Qizheng Zhang,Hyunjun Kim,Ruiji Sun,Jianzhu Yao,Franck Dernoncourt,Satyapriya Krishna,Sina Rismanchian,Bonan Pu,Francesco Pinto,Yingheng Wang,Kumar Shridhar,Kalon J. Overholt,Glib Briia,Hieu Nguyen,David,Soler Bartomeu,Tony CY Pang,Adam Wecker,Yifan Xiong,Fanfei Li,Lukas S. Huber,Joshua Jaeger,Romano De Maddalena,Xing Han Lù,Yuhui Zhang,Claas Beger,Patrick Tser Jern Kon,Sean Li,Vivek Sanker,Ming Yin,Yihao Liang,Xinlu Zhang,Ankit Agrawal,Li S. Yifei,Zechen Zhang,Mu Cai,Yasin Sonmez,Costin Cozianu,Changhao Li,Alex Slen,Shoubin Yu,Hyun Kyu Park,Gabriele Sarti,Marcin Briański,Alessandro Stolfo,Truong An Nguyen,Mike Zhang,Yotam Perlitz,Jose Hernandez-Orallo,Runjia Li,Amin Shabani,Felix Juefei-Xu,Shikhar Dhingra,Orr Zohar,My Chiffon Nguyen,Alexander Pondaven,Abdurrahim Yilmaz,Xuandong Zhao,Chuanyang Jin,Muyan Jiang,Stefan Todoran,Xinyao Han,Jules Kreuer,Brian Rabern,Anna Plassart,Martino Maggetti,Luther Yap,Robert Geirhos,Jonathon Kean,Dingsu Wang,Sina Mollaei,Chenkai Sun,Yifan Yin,Shiqi Wang,Rui Li,Yaowen Chang,Anjiang Wei,Alice Bizeul,Xiaohan Wang,Alexandre Oliveira Arrais,Kushin Mukherjee,Jorge Chamorro-Padial,Jiachen Liu,Xingyu Qu,Junyi Guan,Adam Bouyamourn,Shuyu Wu,Martyna Plomecka,Junda Chen,Mengze Tang,Jiaqi Deng,Shreyas Subramanian,Haocheng Xi,Haoxuan Chen,Weizhi Zhang,Yinuo Ren,Haoqin Tu,Sejong Kim,Yushun Chen,Sara Vera Marjanović,Junwoo Ha,Grzegorz Luczyna,Jeff J. Ma,Zewen Shen,Dawn Song,Cedegao E. Zhang,Zhun Wang,Gaël Gendron,Yunze Xiao,Leo Smucker,Erica Weng,Kwok Hao Lee,Zhe Ye,Stefano Ermon,Ignacio D. Lopez-Miguel,Theo Knights,Anthony Gitter,Namkyu Park,Boyi Wei,Hongzheng Chen,Kunal Pai,Ahmed Elkhanany,Han Lin,Philipp D. Siedler,Jichao Fang,Ritwik Mishra,Károly Zsolnai-Fehér,Xilin Jiang,Shadab Khan,Jun Yuan,Rishab Kumar Jain,Xi Lin,Mike Peterson,Zhe Wang,Aditya Malusare,Maosen Tang,Isha Gupta,Ivan Fosin,Timothy Kang,Barbara Dworakowska,Kazuki Matsumoto,Guangyao Zheng,Gerben Sewuster,Jorge Pretel Villanueva,Ivan Rannev,Igor Chernyavsky,Jiale Chen,Deepayan Banik,Ben Racz,Wenchao Dong,Jianxin Wang,Laila Bashmal,Duarte V. Gonçalves,Wei Hu,Kaushik Bar,Ondrej Bohdal,Atharv Singh Patlan,Shehzaad Dhuliawala,Caroline Geirhos,Julien Wist,Yuval Kansal,Bingsen Chen,Kutay Tire,Atak Talay Yücel,Brandon Christof,Veerupaksh Singla,Zijian Song,Sanxing Chen,Jiaxin Ge,Kaustubh Ponkshe,Isaac Park,Tianneng Shi,Martin Q. Ma,Joshua Mak,Sherwin Lai,Antoine Moulin,Zhuo Cheng,Zhanda Zhu,Ziyi Zhang,Vaidehi Patil,Ketan Jha,Qiutong Men,Jiaxuan Wu,Tianchi Zhang,Bruno Hebling Vieira,Alham Fikri Aji,Jae-Won Chung,Mohammed Mahfoud,Ha Thi Hoang,Marc Sperzel,Wei Hao,Kristof Meding,Sihan Xu,Vassilis Kostakos,Davide Manini,Yueying Liu,Christopher Toukmaji,Jay Paek,Eunmi Yu,Arif Engin Demircali,Zhiyi Sun,Ivan Dewerpe,Hongsen Qin,Roman Pflugfelder,James Bailey,Johnathan Morris,Ville Heilala,Sybille Rosset,Zishun Yu,Peter E. Chen,Woongyeong Yeo,Eeshaan Jain,Ryan Yang,Sreekar Chigurupati,Julia Chernyavsky,Sai Prajwal Reddy,Subhashini Venugopalan,Hunar Batra,Core Francisco Park,Hieu Tran,Guilherme Maximiano,Genghan Zhang,Yizhuo Liang,Hu Shiyu,Rongwu Xu,Rui Pan,Siddharth Suresh,Ziqi Liu,Samaksh Gulati,Songyang Zhang,Peter Turchin,Christopher W. Bartlett,Christopher R. Scotese,Phuong M. Cao,Aakaash Nattanmai,Gordon McKellips,Anish Cheraku,Asim Suhail,Ethan Luo,Marvin Deng,Jason Luo,Ashley Zhang,Kavin Jindel,Jay Paek,Kasper Halevy,Allen Baranov,Michael Liu,Advaith Avadhanam,David Zhang,Vincent Cheng,Brad Ma,Evan Fu,Liam Do,Joshua Lass,Hubert Yang,Surya Sunkari,Vishruth Bharath,Violet Ai,James Leung,Rishit Agrawal,Alan Zhou,Kevin Chen,Tejas Kalpathi,Ziqi Xu,Gavin Wang,Tyler Xiao,Erik Maung,Sam Lee,Ryan Yang,Roy Yue,Ben Zhao,Julia Yoon,Sunny Sun,Aryan Singh,Ethan Luo,Clark Peng,Tyler Osbey,Taozhi Wang,Daryl Echeazu,Hubert Yang,Timothy Wu,Spandan Patel,Vidhi Kulkarni,Vijaykaarti Sundarapandiyan,Ashley Zhang,Andrew Le,Zafir Nasim,Srikar Yalam,Ritesh Kasamsetty,Soham Samal,Hubert Yang,David Sun,Nihar Shah,Abhijeet Saha,Alex Zhang,Leon Nguyen,Laasya Nagumalli,Kaixin Wang,Alan Zhou,Aidan Wu,Jason Luo,Anwith Telluri,Summer Yue,Alexandr Wang,Dan Hendrycks
発行日	2025-04-11 03:34:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

IFShip: Interpretable Fine-grained Ship Classification with Domain Knowledge-Enhanced Vision-Language Models

投稿日: 2025年4月14日作成者: jarxiv

要約

エンドツーエンドの解釈は、現在、リモートセンシングの細粒船分類（RS-FGSC）タスクを支配しています。
ただし、推論プロセスは解釈できないままであり、これらのモデルが「ブラックボックス」システムとして批判することにつながります。
この問題に対処するために、ドメインの知識強化されたチェーンオブサベート（COT）プロンプト生成メカニズムを提案します。これは、タスク固有の命令に従うデータセットであるタイタニック系FGを半自動的に構築するために使用されます。
Titanic-FGSをトレーニングすることにより、一般的なドメインビジョン言語モデル（VLM）をFGSCタスクに適応させ、Ifshipという名前のモデルになります。
Ifshipに基づいて、FGSCの問題を段階的な推論タスクとして再定義するFGSCビジュアルチャットボットを開発し、自然言語で推論プロセスを伝えます。
実験結果は、Ifshipが解釈可能性と分類精度の両方で最先端のFGSCアルゴリズムを上回ることを示しています。
さらに、LlavaやMinigpt-4などのVLMと比較して、IfshipはFGSCタスクで優れたパフォーマンスを示しています。
きめ細かい船の種類が人間の目に認識できる場合、正確な一連の推論を提供し、そうでない場合は解釈可能な説明を提供します。

要約(オリジナル)

End-to-end interpretation currently dominates the remote sensing fine-grained ship classification (RS-FGSC) task. However, the inference process remains uninterpretable, leading to criticisms of these models as ‘black box’ systems. To address this issue, we propose a domain knowledge-enhanced Chain-of-Thought (CoT) prompt generation mechanism, which is used to semi-automatically construct a task-specific instruction-following dataset, TITANIC-FGS. By training on TITANIC-FGS, we adapt general-domain vision-language models (VLMs) to the FGSC task, resulting in a model named IFShip. Building upon IFShip, we develop an FGSC visual chatbot that redefines the FGSC problem as a step-by-step reasoning task and conveys the reasoning process in natural language. Experimental results show that IFShip outperforms state-of-the-art FGSC algorithms in both interpretability and classification accuracy. Furthermore, compared to VLMs such as LLaVA and MiniGPT-4, IFShip demonstrates superior performance on the FGSC task. It provides an accurate chain of reasoning when fine-grained ship types are recognizable to the human eye and offers interpretable explanations when they are not.

arxiv情報

著者	Mingning Guo,Mengwei Wu,Yuxiang Shen,Haifeng Li,Chao Tao
発行日	2025-04-11 03:53:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL | コメントを受け付けていません

Millions of States: Designing a Scalable MoE Architecture with RWKV-7 Meta-learner

投稿日: 2025年4月14日作成者: jarxiv

要約

RWKV-7のような状態ベースのシーケンスモデルは、変圧器アーキテクチャの魅力的な代替品を提供し、短いコンテキストシナリオでより大きな表現力を実証し、\（\ text {tc}^0 \）複雑さクラスを超えて状態追跡を可能にし、より大きな表現力を実証します。
ただし、RWKV-7には、トークンパラメーターの相互作用とネイティブスケーラビリティのメカニズムがあり、再訓練なしで適応性と成長を制限します。
この論文では、注意メカニズムを完全に状態駆動型のアプローチに置き換えるRWKV-7の新しい拡張である\ textBf {Meta-state}を提案し、\ textBF {self-stateエンコーダー}（SSE）メカニズムを介したトークンパラメーターの相互作用を統合します。
SSEは、RWKV-7加重キー価値（WKV）状態の一部を変換重みとして再利用して、トークン処理の自己縁放電特性を保存しながら、新しいトレーニング可能なマトリックスまたはソフトマックス操作を導入することなく、線形の状態駆動型の方法でトークンパラメーターの相互作用をエンコードします。
Meta-Stateは、WKV状態とパラメータートークンを拡張し、再訓練なしで既存のパラメーターを再利用することにより、プログレッシブモデルのスケーリングをサポートします。
私たちのアプローチは、状態ベースのモデリング、トークンパラメーターの相互作用、スケーラブルなアーキテクチャの間のギャップを橋渡しし、線形の複雑さと一定のメモリ使用量を備えた効率的で適応可能なシーケンスモデリングのための柔軟なフレームワークを提供します。

要約(オリジナル)

State-based sequence models like RWKV-7 offer a compelling alternative to Transformer architectures, achieving linear complexity while demonstrating greater expressive power in short-context scenarios and enabling state tracking beyond the $\text{TC}^0$ complexity class. However, RWKV-7 lacks mechanisms for token-parameter interactions and native scalability, limiting its adaptability and growth without retraining. In this paper, we propose \textbf{Meta-State}, a novel extension to RWKV-7 that replaces attention mechanisms with a fully state-driven approach, integrating token-parameter interactions through a \textbf{Self-State Encoder} (SSE) mechanism. The SSE repurposes a portion of the RWKV-7 Weighted Key-Value (WKV) state as transformation weights to encode token-parameter interactions in a linear, state-driven manner without introducing new trainable matrices or softmax operations, while preserving the autoregressive property of token processing. Meta-State supports progressive model scaling by expanding the WKV state and parameter tokens, reusing existing parameters without retraining. Our approach bridges the gap between state-based modeling, token-parameter interactions, and scalable architectures, offering a flexible framework for efficient and adaptable sequence modeling with linear complexity and constant memory usage.

arxiv情報

著者	Liu Xiao,Li Zhiyuan,Lin Yueyu
発行日	2025-04-11 04:14:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL, cs.LG | コメントを受け付けていません

MathSpeech: Leveraging Small LMs for Accurate Conversion in Mathematical Speech-to-Formula

投稿日: 2025年4月14日作成者: jarxiv

要約

数学の講義や研究プレゼンテーションなどのさまざまな学術的および専門的な設定では、数学的表現を口頭で伝える必要があることがよくあります。
ただし、視覚を伴わずに数学的な表現を声に出して読むことは、特に言語の障壁のために聴覚障害者または字幕に依存している人にとっては、理解を大幅に妨げる可能性があります。
たとえば、プレゼンターがEulerの式を読むと、現在の自動音声認識（ASR）モデルが冗長性とエラーが発生しやすいテキストの説明を生成することがよくあります（例えば、Xのパワーへのeは、XのコサインとI $ \ TextIT {Side} $のxのコサインに等しくなります）。
I \ sin（x）$）。これは、明確な理解とコミュニケーションを妨げます。
この問題に対処するために、ASRモデルを小言語モデル（SLM）と統合する新しいパイプラインであるMathSpeechを紹介し、数学式のエラーを修正し、音声式を構造化された$ \ latex {} $表現に正確に変換します。
講義録音から派生した新しいデータセットで評価されたMathSpeechは、$ \ laTex {} $の生成機能を主要な商用大型言語モデル（LLM）に匹敵させ、わずか120mパラメーターの微調整された小言語モデルを活用します。
具体的には、$ \ latex {} $翻訳のCER、ブルー、およびルージュのスコアに関して、MathSpeechはGPT-4Oと比較して有意に優れた機能を示しました。
CERの0.390から0.298に減少し、GPT-4Oと比較してより高いルージュ/BLEUスコアが観察されました。

要約(オリジナル)

In various academic and professional settings, such as mathematics lectures or research presentations, it is often necessary to convey mathematical expressions orally. However, reading mathematical expressions aloud without accompanying visuals can significantly hinder comprehension, especially for those who are hearing-impaired or rely on subtitles due to language barriers. For instance, when a presenter reads Euler’s Formula, current Automatic Speech Recognition (ASR) models often produce a verbose and error-prone textual description (e.g., e to the power of i x equals cosine of x plus i $\textit{side}$ of x), instead of the concise $\LaTeX{}$ format (i.e., $ e^{ix} = \cos(x) + i\sin(x) $), which hampers clear understanding and communication. To address this issue, we introduce MathSpeech, a novel pipeline that integrates ASR models with small Language Models (sLMs) to correct errors in mathematical expressions and accurately convert spoken expressions into structured $\LaTeX{}$ representations. Evaluated on a new dataset derived from lecture recordings, MathSpeech demonstrates $\LaTeX{}$ generation capabilities comparable to leading commercial Large Language Models (LLMs), while leveraging fine-tuned small language models of only 120M parameters. Specifically, in terms of CER, BLEU, and ROUGE scores for $\LaTeX{}$ translation, MathSpeech demonstrated significantly superior capabilities compared to GPT-4o. We observed a decrease in CER from 0.390 to 0.298, and higher ROUGE/BLEU scores compared to GPT-4o.

arxiv情報

著者	Sieun Hyeon,Kyudan Jung,Jaehee Won,Nam-Joon Kim,Hyun Gon Ryu,Hyuk-Jae Lee,Jaeyoung Do
発行日	2025-04-11 04:17:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents

投稿日: 2025年4月14日作成者: jarxiv

要約

マルチモーダル大手言語モデル（MLLM）は大きな進歩を示しており、具体化されたエージェントに有望な未来を提供しています。
MLLMを評価するための既存のベンチマークは、主に静的画像またはビデオを利用して、非対話シナリオに評価を制限します。
一方、既存の具体化されたAIベンチマークはタスク固有であり、MLLMの具体化された機能を適切に評価しないほど多様ではありません。
これに対処するために、具体化されたタスクを備えたMLLMの包括的でインタラクティブな評価ベンチマークであるEmbodiedevalを提案します。
EmbodiedEvalは、125のさまざまな3Dシーン内の328の異なるタスクを特徴としており、それぞれが厳密に選択され、注釈が付けられています。
MLLMに合わせて調整された統一シミュレーションと評価フレームワーク内で、多様性が大幅に向上した既存の具体化されたAIタスクの幅広いスペクトルをカバーしています。
タスクは、ナビゲーション、オブジェクトの相互作用、社会的相互作用、属性質問の回答、およびエージェントのさまざまな機能を評価するための空間的質問の5つのカテゴリに編成されます。
Embodiedevalの最先端のMLLMを評価し、具体化されたタスクの人間レベルと比較して、それらが大きな不足を持っていることを発見しました。
私たちの分析は、具体化された機能における既存のMLLMの制限を示しており、将来の開発に関する洞察を提供します。
https://github.com/thunlp/embodiedevalですべての評価データとシミュレーションフレームワークをオープンソースします。

要約(オリジナル)

Multimodal Large Language Models (MLLMs) have shown significant advancements, providing a promising future for embodied agents. Existing benchmarks for evaluating MLLMs primarily utilize static images or videos, limiting assessments to non-interactive scenarios. Meanwhile, existing embodied AI benchmarks are task-specific and not diverse enough, which do not adequately evaluate the embodied capabilities of MLLMs. To address this, we propose EmbodiedEval, a comprehensive and interactive evaluation benchmark for MLLMs with embodied tasks. EmbodiedEval features 328 distinct tasks within 125 varied 3D scenes, each of which is rigorously selected and annotated. It covers a broad spectrum of existing embodied AI tasks with significantly enhanced diversity, all within a unified simulation and evaluation framework tailored for MLLMs. The tasks are organized into five categories: navigation, object interaction, social interaction, attribute question answering, and spatial question answering to assess different capabilities of the agents. We evaluated the state-of-the-art MLLMs on EmbodiedEval and found that they have a significant shortfall compared to human level on embodied tasks. Our analysis demonstrates the limitations of existing MLLMs in embodied capabilities, providing insights for their future development. We open-source all evaluation data and simulation framework at https://github.com/thunlp/EmbodiedEval.

arxiv情報

著者	Zhili Cheng,Yuge Tu,Ran Li,Shiqi Dai,Jinyi Hu,Shengding Hu,Jiahao Li,Yang Shi,Tianyu Yu,Weize Chen,Lei Shi,Maosong Sun
発行日	2025-04-11 04:26:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント