jarxiv | Japanese arxiv | ページ 562

Assessing Tenstorrent’s RISC-V MatMul Acceleration Capabilities

投稿日: 2025年5月12日作成者: jarxiv

要約

大規模な言語モデル（LLMS）サービスとしての生成AIの需要の増加により、計算効率とエネルギー消費を最適化する特殊なハードウェアアーキテクチャの必要性が促進されています。
このペーパーでは、LLM計算の基本的な動作である縮小数値精度での基本的な線形代数カーネルのTenStorrent Grayskull E75 RISC-Vアクセラレータの性能を評価します。
Grayskullの実行モデル、グリッドサイズ、マトリックス寸法、データ形式、および数値精度インパクト計算効率の詳細な特性評価を提示します。
さらに、Intel Sapphire Rapidsプロセッサと2つのNVIDIA GPU（V100およびA100）を含むテンソル加速を備えた最先端のアーキテクチャとグレイクスルのパフォーマンスを比較します。
Nvidia GPUは生のパフォーマンスを支配していますが、Grayskullは消費電力と計算スループットの競争的トレードオフを示し、BF16で1.55 TFLOPS/WATTのピークに達します。

要約(オリジナル)

The increasing demand for generative AI as Large Language Models (LLMs) services has driven the need for specialized hardware architectures that optimize computational efficiency and energy consumption. This paper evaluates the performance of the Tenstorrent Grayskull e75 RISC-V accelerator for basic linear algebra kernels at reduced numerical precision, a fundamental operation in LLM computations. We present a detailed characterization of Grayskull’s execution model, gridsize, matrix dimensions, data formats, and numerical precision impact computational efficiency. Furthermore, we compare Grayskull’s performance against state-of-the-art architectures with tensor acceleration, including Intel Sapphire Rapids processors and two NVIDIA GPUs (V100 and A100). Whilst NVIDIA GPUs dominate raw performance, Grayskull demonstrates a competitive trade-off between power consumption and computational throughput, reaching a peak of 1.55 TFLOPs/Watt with BF16.

arxiv情報

著者	Hiari Pizzini Cavagna,Daniele Cesarini,Andrea Bartolini
発行日	2025-05-09 14:29:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.AR, cs.PF | コメントを受け付けていません

CoverUp: Effective High Coverage Test Generation for Python

投稿日: 2025年5月12日作成者: jarxiv

要約

テストはソフトウェア開発の重要な部分です。
テスト生成ツールは、テスト作成の労働集約的なタスクを自動化しようとしますが、高カバーテストの生成は依然として困難です。
このペーパーでは、高カバーのPython回帰テストの生成を促進するための新しいアプローチである隠蔽を提案します。
カバーアップは、カバレッジ分析、コードコンテキスト、およびフィードバックを、LLMを繰り返しガイドして、ラインと分岐のカバレッジを改善するテストを生成するプロンプトのプロンプトの組み合わせです。
オープンソースのPythonプロジェクトから派生した挑戦的なコードのベンチマーク全体で、プロトタイプの隠蔽実装を評価し、カバーアップが最新のものを大幅に改善することを示しています。
ハイブリッド検索/LLMベースのテストジェネレーターであるCodamosaと比較して、カバーアップは、モジュールあたりの中央値ライン+80％（47％）の分岐カバレッジを達成します。
MUTAP、MutationおよびLLMベースのテストジェネレーターであるMutapと比較して、カバーアップは、全体的なライン+ブランチカバレッジが89％（77％）を達成します。
また、カバーアップのパフォーマンスは、使用されたLLMだけでなく、コンポーネントの有効性を組み合わせたものに由来することも示しています。

要約(オリジナル)

Testing is an essential part of software development. Test generation tools attempt to automate the otherwise labor-intensive task of test creation, but generating high-coverage tests remains challenging. This paper proposes CoverUp, a novel approach to driving the generation of high-coverage Python regression tests. CoverUp combines coverage analysis, code context, and feedback in prompts that iteratively guide the LLM to generate tests that improve line and branch coverage. We evaluate our prototype CoverUp implementation across a benchmark of challenging code derived from open-source Python projects and show that CoverUp substantially improves on the state of the art. Compared to CodaMosa, a hybrid search/LLM-based test generator, CoverUp achieves a per-module median line+branch coverage of 80% (vs. 47%). Compared to MuTAP, a mutation- and LLM-based test generator, CoverUp achieves an overall line+branch coverage of 89% (vs. 77%). We also demonstrate that CoverUp’s performance stems not only from the LLM used but from the combined effectiveness of its components.

arxiv情報

著者	Juan Altmayer Pizzorno,Emery D. Berger
発行日	2025-05-09 14:33:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG, cs.PL, cs.SE, D.2.5 | コメントを受け付けていません

Planet as a Brain: Towards Internet of AgentSites based on AIOS Server

投稿日: 2025年5月12日作成者: jarxiv

要約

インターネットは、「ウェブサイトのインターネット」から「エージェントサイトのインターネット」への歴史的な変革を行っています。
従来のウェブサイトは情報のホスティングと普及の基盤として機能しましたが、エージェントサイトがインターネットのハブとして機能する新しいフロンティアが出現しています。各エージェントサイトは、タスクを受け取る1つ以上のAIエージェントをホストし、それらに対処し、実用的なソリューションを提供し、デジタル景観の大幅な変化をマークし、次世代のオンラインエコシステムを表しています。
このビジョンの下で、AIエージェントオペレーティングシステムであるAIOSは、AIエージェントの開発、展開、実行のサーバーとして機能します。これは、インターネットのインターネットサイトの基本的なインフラストラクチャです。
このホワイトペーパーでは、エージェントをホストし、分散エージェント間のグローバルスケールコラボレーションを可能にするランタイムフレームワークであるAIOSサーバーを紹介します。
AIOSサーバーは、モデルコンテキストプロトコル（MCP）とJSON-RPCを活用する通信プロトコルを提供して、エージェントエージェントまたはヒューマンエージェントの相互作用を有効にします。
各AIOSノードは、集中オーケストレーションに依存せずにピアツーピア調整をサポートしながら、エージェントをホストおよび実行するためのサーバーとして動作します。
AIOSサーバーに基づいて、https://planet.aios.foundationで、エージェント登録とディスカバリーおよびインタラクティブコミュニケーションのエージェントチャットのエージェントハブを含む、世界初の実質的に展開されたエージェントサイト（AIOS-IOA）をさらに紹介します。
分散ハッシュテーブル（DHT）とゴシッププロトコルに基づくエージェント発見メカニズムは、エージェントサイトのインターネットの検索エンジンとして機能します。
この作品は、自律エージェントがWebの一流の市民になる新しいパラダイムであるエージェントサイトのインターネットを構築するための実用的な基盤を提供します。
実装はhttps://github.com/agiresearch/aios.serverで入手でき、https://github.com/agiresearch/aiosのAIOSメインブランチに統合されています。

要約(オリジナル)

The internet is undergoing a historical transformation from the ‘Internet of Websites’ to the ‘Internet of AgentSites.’ While traditional Websites served as the foundation for information hosting and dissemination, a new frontier is emerging where AgentSites serve as the hubs of the internet, where each AgentSite hosts one or more AI agents that receive tasks, address them, and deliver actionable solutions, marking a significant shift in the digital landscape and representing the next generation of online ecosystems. Under this vision, AIOS, the AI Agent Operating System, serves as the server for the development, deployment and execution of AI agents, which is a fundamental infrastructure for the Internet of Agentsites. In this paper, we introduce AIOS Server, a runtime framework to host agents and enable global-scale collaboration among decentralized agents. AIOS Server provides a communication protocol leveraging the Model Context Protocol (MCP) and JSON-RPC to enable agent-agent or human-agent interactions. Each AIOS node operates as a server to host and execute agents, while supporting peer-to-peer coordination without reliance on centralized orchestration. Based on AIOS Server, we further present the world’s first practically deployed Internet of Agentsites (AIOS-IoA), including AgentHub for agent registration and discovery and AgentChat for interactive communication, at https://planet.aios.foundation. The agent discovery mechanism based on Distributed Hash Tables (DHT) and a Gossip protocol serves as the search engine for the internet of agentsites. This work provides a practical foundation for building the Internet of Agentsites-a new paradigm where autonomous agents become first-class citizens of the web. The implementation is available at https://github.com/agiresearch/AIOS.Server and is integrated into the AIOS main branch at https://github.com/agiresearch/AIOS.

arxiv情報

著者	Xiang Zhang,Yongfeng Zhang
発行日	2025-05-09 14:35:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.NI | コメントを受け付けていません

UniSymNet: A Unified Symbolic Network Guided by Transformer

投稿日: 2025年5月12日作成者: jarxiv

要約

Symbolic Regression（SR）は、入力データから数学的式を自動的に発見するための強力な手法です。
主流のSRアルゴリズムは、広大な関数空間で最適なシンボリックツリーを検索しますが、ツリー構造の複雑さの増加により、パフォーマンスが制限されます。
ニューラルネットワークに触発された象徴的なネットワークは、有望な新しいパラダイムとして浮上しています。
ただし、ほとんどの既存のシンボリックネットワークは依然として特定の課題に直面しています：バイナリ非線形演算子$ \ {\ {\ div \} $は自然に多変量演算子に拡張することはできず、固定アーキテクチャでのトレーニングはしばしばより高い複雑さと過剰につながります。
この作業では、非線形バイナリ演算子をネストされたunary演算子に統合し、ユニサイムが複雑さを減らすことができる条件を定義する統一されたシンボリックネットワークを提案します。
さらに、構造選択を導くための新しいラベルエンコード方法を備えた変圧器モデルを事前に訓練し、シンボリックネットワークのパラメーターを学習するための客観的固有の最適化戦略を採用します。
Unisymnetは、高いフィッティング精度、優れたシンボリックソリューション率、および比較的低い発現の複雑さを示し、低次元の標準ベンチマークと高次元のSRBenchで競争力のあるパフォーマンスを達成します。

要約(オリジナル)

Symbolic Regression (SR) is a powerful technique for automatically discovering mathematical expressions from input data. Mainstream SR algorithms search for the optimal symbolic tree in a vast function space, but the increasing complexity of the tree structure limits their performance. Inspired by neural networks, symbolic networks have emerged as a promising new paradigm. However, most existing symbolic networks still face certain challenges: binary nonlinear operators $\{\times, \div\}$ cannot be naturally extended to multivariate operators, and training with fixed architecture often leads to higher complexity and overfitting. In this work, we propose a Unified Symbolic Network that unifies nonlinear binary operators into nested unary operators and define the conditions under which UniSymNet can reduce complexity. Moreover, we pre-train a Transformer model with a novel label encoding method to guide structural selection, and adopt objective-specific optimization strategies to learn the parameters of the symbolic network. UniSymNet shows high fitting accuracy, excellent symbolic solution rate, and relatively low expression complexity, achieving competitive performance on low-dimensional Standard Benchmarks and high-dimensional SRBench.

arxiv情報

著者	Xinxin Li,Juan Zhang,Da Li,Xingyu Liu,Jin Xu,Junping Yin
発行日	2025-05-09 14:38:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG, cs.SC | コメントを受け付けていません

Free and Fair Hardware: A Pathway to Copyright Infringement-Free Verilog Generation using LLMs

投稿日: 2025年5月12日作成者: jarxiv

要約

機能的なVerilogコードの生成など、ハードウェア設計タスクの大規模な言語モデル（LLM）機能の制限により、オープンソースリポジトリのキュレーションされたハードウェアデータセットを利用するさまざまな微調整の最適化が動機付けられています。
ただし、これらのデータセットはサイズが限られたままであり、再利用のライセンスに関する最小限のチェックを含むため、微調整されたLLMによる著作権違反の潜在的な違反が生じます。
したがって、著作権で保護されたコードを生成するためにVerilogで訓練されたLLMSのリスクを推定するための評価ベンチマークを提案します。
このリスクを最小限に抑えるために、220Kを超えるファイルを含むオープンソースVerilogデータセットFreeset、および公正使用Verilogデータの追加保証を提供するために使用される自動データセットキュレーションフレームワークを提示します。
次に、継続的なプリトレーニングで構成されるLLM微調整フレームワークを実行し、VerilogのFreevのLlamaモデルを微調整します。
我々の結果は、Freevが以前の作品間で著作権の侵害のリスクが最も少ないことを示しており、違反率はわずか3％であることを示しています。
さらに、実験結果は、ベースラインモデルにわたってVerilog生成機能の改善を示し、Verilogeval Pass@10レートを10％以上改善します。

要約(オリジナル)

Limitations in Large Language Model (LLM) capabilities for hardware design tasks, such as generating functional Verilog codes, have motivated various fine-tuning optimizations utilizing curated hardware datasets from open-source repositories. However, these datasets remain limited in size and contain minimal checks on licensing for reuse, resulting in potential copyright violations by fine-tuned LLMs. Therefore, we propose an evaluation benchmark to estimate the risk of Verilog-trained LLMs to generate copyright-protected codes. To minimize this risk, we present an open-source Verilog dataset, FreeSet, containing over 220k files, along with the automated dataset curation framework utilized to provide additional guarantees of fair-use Verilog data. We then execute an LLM fine-tuning framework consisting of continual pre-training, resulting in a fine-tuned Llama model for Verilog, FreeV. Our results indicate that FreeV demonstrates the smallest risk of copyright-infringement among prior works, with only a 3% violation rate. Furthermore, experimental results demonstrate improvements in Verilog generation functionality over its baseline model, improving VerilogEval pass@10 rates by over 10%.

arxiv情報

著者	Sam Bush,Matthew DeLorenzo,Phat Tieu,Jeyavijayan Rajendran
発行日	2025-05-09 14:44:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI | コメントを受け付けていません

Credal Wrapper of Model Averaging for Uncertainty Estimation in Classification

投稿日: 2025年5月12日作成者: jarxiv

要約

このペーパーでは、ベイジアンニューラルネットワーク（BNNS）とディープアンサンブル（DES）のモデル平均化の信用セット表現を策定するために、信用ラッパーと呼ばれる革新的なアプローチを提示し、分類タスクの不確実性の推定を改善できます。
BNNまたはDESに由来する単一の予測分布の有限コレクションを考えると、提案された信用ラッパーアプローチは、クラスごとに上限とより低い確率の境界を抽出し、限られた量の分布が利用できるため認識論的な不確実性を認めます。
クラスにわたるこのような確率間隔は、凸型の確率（クレジットセット）にマッピングでき、そこから交差点確率変換と呼ばれる変換を使用して一意の予測を取得できます。
この記事では、さまざまなデータセットペア（CIFAR10/100対SVHN/TINY-IMAGENET、CIFAR10 VS CIFAR10対CIFAR10-C、CIFAR100対CIFAR100-C、およびImagenet vs Imagenet-o）を使用してVSのネットワークを使用している、cifar100 vs cifar100-c）を含む、いくつかの分散分布（OOD）検出ベンチマークに関する広範な実験を実施します。
ResNet-18/50、EfficientNet B2、およびVITベース）。
BNNおよびDEベースラインと比較して、提案された信用ラッパー法は、不確実性の推定で優れたパフォーマンスを示し、破損したデータで予想されるキャリブレーションエラーが低いことを実現します。

要約(オリジナル)

This paper presents an innovative approach, called credal wrapper, to formulating a credal set representation of model averaging for Bayesian neural networks (BNNs) and deep ensembles (DEs), capable of improving uncertainty estimation in classification tasks. Given a finite collection of single predictive distributions derived from BNNs or DEs, the proposed credal wrapper approach extracts an upper and a lower probability bound per class, acknowledging the epistemic uncertainty due to the availability of a limited amount of distributions. Such probability intervals over classes can be mapped on a convex set of probabilities (a credal set) from which, in turn, a unique prediction can be obtained using a transformation called intersection probability transformation. In this article, we conduct extensive experiments on several out-of-distribution (OOD) detection benchmarks, encompassing various dataset pairs (CIFAR10/100 vs SVHN/Tiny-ImageNet, CIFAR10 vs CIFAR10-C, CIFAR100 vs CIFAR100-C and ImageNet vs ImageNet-O) and using different network architectures (such as VGG16, ResNet-18/50, EfficientNet B2, and ViT Base). Compared to the BNN and DE baselines, the proposed credal wrapper method exhibits superior performance in uncertainty estimation and achieves a lower expected calibration error on corrupted data.

arxiv情報

著者	Kaizheng Wang,Fabio Cuzzolin,Keivan Shariatmadar,David Moens,Hans Hallez
発行日	2025-05-09 14:56:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

LLMs Outperform Experts on Challenging Biology Benchmarks

投稿日: 2025年5月12日作成者: jarxiv

要約

この研究では、分子生物学、遺伝学、クローニング、ウイルス学、およびバイオセキュリティにまたがる8つの多様な生物学ベンチマークで27のフロンティアの大手言語モデルを体系的に評価します。
2022年11月から2025年4月までにリリースされた主要なAI開発者のモデルは、ベンチマークごとに10回の独立した実行を通じて評価されました。
調査結果は、生物学的能力の劇的な改善を明らかにしています。
トップモデルのパフォーマンスは、調査期間中のウイルス能力テストの挑戦的なテキストのみのサブセットで4倍以上増加し、トップモデルは現在、専門家のウイルス学者と同様に2回もパフォーマンスを遂げています。
現在、いくつかのモデルは、ラボベンチCloningscenariosやGPQAおよびWMDPの生物学サブセットなど、他の挑戦的なベンチマークでエキスパートレベルのパフォーマンスに一致するか、それを超えています。
期待に反して、チェーンオブは、ゼロショット評価よりもパフォーマンスを大幅に向上させませんでしたが、O3-MINIおよびCLAUDE 3.7ソネットの拡張推論機能は通常、推論スケーリングによって予測されるようにパフォーマンスを改善しました。
PubMedQAやMMLUおよびWMDP生物学のサブセットなどのベンチマークは、100％未満のパフォーマンスプラトーを示し、基礎となるベンチマークデータのベンチマーク飽和とエラーを示唆しています。
分析は、AIシステムが進歩し続けるにつれて、より洗練された評価方法論の必要性を強調しています。

要約(オリジナル)

This study systematically evaluates 27 frontier Large Language Models on eight diverse biology benchmarks spanning molecular biology, genetics, cloning, virology, and biosecurity. Models from major AI developers released between November 2022 and April 2025 were assessed through ten independent runs per benchmark. The findings reveal dramatic improvements in biological capabilities. Top model performance increased more than 4-fold on the challenging text-only subset of the Virology Capabilities Test over the study period, with the top model now performing twice as well as expert virologists. Several models now match or exceed expert-level performance on other challenging benchmarks, including LAB-Bench CloningScenarios and the biology subsets of GPQA and WMDP. Contrary to expectations, chain-of-thought did not substantially improve performance over zero-shot evaluation, while extended reasoning features in o3-mini and Claude 3.7 Sonnet typically improved performance as predicted by inference scaling. Benchmarks such as PubMedQA and the MMLU and WMDP biology subsets exhibited performance plateaus well below 100%, suggesting benchmark saturation and errors in the underlying benchmark data. The analysis highlights the need for more sophisticated evaluation methodologies as AI systems continue to advance.

arxiv情報

著者	Lennart Justen
発行日	2025-05-09 15:05:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG, q-bio.QM | コメントを受け付けていません

Multimodal Sentiment Analysis on CMU-MOSEI Dataset using Transformer-based Models

投稿日: 2025年5月12日作成者: jarxiv

要約

このプロジェクトは、CMU-Moseiデータセットを使用してマルチモーダル感情分析を実行し、テキスト、オーディオ、視覚モダリティを統合するための早期融合を備えたトランスベースのモデルを使用します。
各モダリティに対してBERTベースのエンコーダーを使用して、分類前に連結された埋め込みを抽出します。
このモデルは、97.87 \％7クラスの精度とテストセットに0.9682 F1スコアで強力なパフォーマンスを達成し、クロスモーダル相互作用をキャプチャする際の早期融合の有効性を示しています。
トレーニングは、Adamの最適化（LR = 1E-4）、ドロップアウト（0.3）、および早期停止を利用して、一般化と堅牢性を確保しました。
結果は、マルチモーダル感情のモデリングにおけるトランスアーキテクチャの優位性を強調しており、低MAE（0.1060）が正確な感情強度予測を示しています。
将来の作業は、融合戦略を比較するか、解釈可能性を高めることができます。
このアプローチは、感情分析のために言語、音響、および視覚的な手がかりを効果的に組み合わせることにより、マルチモーダル学習を利用します。

要約(オリジナル)

This project performs multimodal sentiment analysis using the CMU-MOSEI dataset, using transformer-based models with early fusion to integrate text, audio, and visual modalities. We employ BERT-based encoders for each modality, extracting embeddings that are concatenated before classification. The model achieves strong performance, with 97.87\% 7-class accuracy and a 0.9682 F1-score on the test set, demonstrating the effectiveness of early fusion in capturing cross-modal interactions. The training utilized Adam optimization (lr=1e-4), dropout (0.3), and early stopping to ensure generalization and robustness. Results highlight the superiority of transformer architectures in modeling multimodal sentiment, with a low MAE (0.1060) indicating precise sentiment intensity prediction. Future work may compare fusion strategies or enhance interpretability. This approach utilizes multimodal learning by effectively combining linguistic, acoustic, and visual cues for sentiment analysis.

arxiv情報

著者	Jugal Gajjar,Kaustik Ranaware
発行日	2025-05-09 15:10:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

UniVLA: Learning to Act Anywhere with Task-centric Latent Actions

投稿日: 2025年5月12日作成者: jarxiv

要約

ジェネラリストのロボットは、さまざまな環境で効果的に機能する必要があります。
ただし、ほとんどの既存のアプローチは、アクションが発表したデータのスケーリングに大きく依存して機能を強化しています。
その結果、それらは多くの場合、単一の物理的仕様に限定され、さまざまな実施形態と環境で移転可能な知識を学ぶのに苦労します。
これらの制限に立ち向かうために、クロスエンボジメントビジョン言語アクション（VLA）ポリシーを学習するための新しいフレームワークであるUnivlaを提案します。
私たちの重要な革新は、潜在的なアクションモデルを使用したビデオからタスク中心のアクション表現を導き出すことです。
これにより、幅広い実施形態と視点で広範なデータを活用することができます。
タスクに関係なくダイナミクスの効果を緩和するために、言語の指示を組み込み、Dino機能空間内に潜在アクションモデルを確立します。
インターネット規模のビデオから学んだのは、ジェネラリストのポリシーを効率的な潜在アクションデコードを通じてさまざまなロボットに展開できます。
複数の操作およびナビゲーションベンチマーク、および実際のロボット展開で最先端の結果を取得します。
Univlaは、OpenVLAよりも優れたパフォーマンスを達成し、1/20未満のプレイトレーニング計算と1/10のダウンストリームデータを獲得しています。
継続的なパフォーマンスの改善は、人間のビデオを含めても、トレーニングパイプラインに組み込まれている不均一なデータとして観察されます。
結果は、スケーラブルで効率的なロボットポリシー学習を促進するUnivlaの可能性を強調しています。

要約(オリジナル)

A generalist robot should perform effectively across various environments. However, most existing approaches heavily rely on scaling action-annotated data to enhance their capabilities. Consequently, they are often limited to single physical specification and struggle to learn transferable knowledge across different embodiments and environments. To confront these limitations, we propose UniVLA, a new framework for learning cross-embodiment vision-language-action (VLA) policies. Our key innovation is to derive task-centric action representations from videos with a latent action model. This enables us to exploit extensive data across a wide spectrum of embodiments and perspectives. To mitigate the effect of task-irrelevant dynamics, we incorporate language instructions and establish a latent action model within the DINO feature space. Learned from internet-scale videos, the generalist policy can be deployed to various robots through efficient latent action decoding. We obtain state-of-the-art results across multiple manipulation and navigation benchmarks, as well as real-robot deployments. UniVLA achieves superior performance over OpenVLA with less than 1/20 of pretraining compute and 1/10 of downstream data. Continuous performance improvements are observed as heterogeneous data, even including human videos, are incorporated into the training pipeline. The results underscore UniVLA’s potential to facilitate scalable and efficient robot policy learning.

arxiv情報

著者	Qingwen Bu,Yanting Yang,Jisong Cai,Shenyuan Gao,Guanghui Ren,Maoqing Yao,Ping Luo,Hongyang Li
発行日	2025-05-09 15:11:13+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG, cs.RO | コメントを受け付けていません

The Typing Cure: Experiences with Large Language Model Chatbots for Mental Health Support

投稿日: 2025年5月12日作成者: jarxiv

要約

深刻な苦痛を経験している人々は、メンタルヘルスサポートツールとして大規模な言語モデル（LLM）チャットボットをますます使用しています。
ソーシャルメディアでの議論では、関与が一部の人にとってどのように命を救うかを説明していますが、汎用LLMチャットボットには、責任を持って設計されていない場合、ユーザーの福祉を危険にさらす可能性のある顕著なリスクもあることを示唆しています。
この研究では、メンタルヘルスサポートのためにLLMチャットボットを使用した人々の生きた経験を調査します。
私たちは、世界的に多様なバックグラウンドの21人の個人とのインタビューに基づいて、ユーザーがチャットボットのユニークなサポートロールを作成し、日常のケアのギャップを埋め、チャットボットからサポートを求める際に関連する文化的制限をナビゲートする方法を分析します。
私たちは、効果的なサポートに関する心理療法の文献に分析を根拠とし、治療的アライメントの概念を紹介するか、AIをメンタルヘルスのコンテキストの治療価値と調整します。
私たちの研究では、デザイナーがメンタルヘルスケアにおけるLLMチャットボットやその他のAIメンタルヘルスサポートツールの倫理的かつ効果的な使用にアプローチする方法についての推奨事項を提供しています。

要約(オリジナル)

People experiencing severe distress increasingly use Large Language Model (LLM) chatbots as mental health support tools. Discussions on social media have described how engagements were lifesaving for some, but evidence suggests that general-purpose LLM chatbots also have notable risks that could endanger the welfare of users if not designed responsibly. In this study, we investigate the lived experiences of people who have used LLM chatbots for mental health support. We build on interviews with 21 individuals from globally diverse backgrounds to analyze how users create unique support roles for their chatbots, fill in gaps in everyday care, and navigate associated cultural limitations when seeking support from chatbots. We ground our analysis in psychotherapy literature around effective support, and introduce the concept of therapeutic alignment, or aligning AI with therapeutic values for mental health contexts. Our study offers recommendations for how designers can approach the ethical and effective use of LLM chatbots and other AI mental health support tools in mental health care.

arxiv情報

著者	Inhwa Song,Sachin R. Pendse,Neha Kumar,Munmun De Choudhury
発行日	2025-05-09 15:24:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CY, cs.HC | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント