jarxiv | Japanese arxiv

MoRE: Mixture of Residual Experts for Humanoid Lifelike Gaits Learning on Complex Terrains

投稿日: 2025年6月13日作成者: jarxiv

要約

ヒューマノイドロボットは、強化学習（RL）ベースのアプローチを使用して、堅牢な運動機能を実証しています。
さらに、人間のような行動を取得するために、既存の方法は、RLフレームワークで先行する人間の動きトラッキングまたは動きを統合します。
ただし、これらの方法は、固有受容のみを備えたフラットな地形では制限されており、人間のような歩行で挑戦的な地形を横断する能力を制限しています。
この作業では、潜在的な残留専門家とマルチ分類の混合物を使用して、RLポリシーを訓練するための新しいフレームワークを提案します。RLポリシーを訓練します。
2段階のトレーニングパイプラインは、最初に深さカメラを使用して複雑な地形を通過するポリシーを教え、次に人間のような歩行パターンを歩き回ることを可能にします。
また、ロボットベースの高さなどの人間のような行動を調整するための歩行報酬を設計します。
シミュレーションと現実世界の実験は、私たちのフレームワークが複雑な地形を横断する並外れたパフォーマンスを示し、複数の人間のような歩行パターン間のシームレスな遷移を達成することを示しています。

要約(オリジナル)

Humanoid robots have demonstrated robust locomotion capabilities using Reinforcement Learning (RL)-based approaches. Further, to obtain human-like behaviors, existing methods integrate human motion-tracking or motion prior in the RL framework. However, these methods are limited in flat terrains with proprioception only, restricting their abilities to traverse challenging terrains with human-like gaits. In this work, we propose a novel framework using a mixture of latent residual experts with multi-discriminators to train an RL policy, which is capable of traversing complex terrains in controllable lifelike gaits with exteroception. Our two-stage training pipeline first teaches the policy to traverse complex terrains using a depth camera, and then enables gait-commanded switching between human-like gait patterns. We also design gait rewards to adjust human-like behaviors like robot base height. Simulation and real-world experiments demonstrate that our framework exhibits exceptional performance in traversing complex terrains, and achieves seamless transitions between multiple human-like gait patterns.

arxiv情報

著者	Dewei Wang,Xinmiao Wang,Xinzhe Liu,Jiyuan Shi,Yingnan Zhao,Chenjia Bai,Xuelong Li
発行日	2025-06-12 03:06:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.RO | コメントを受け付けていません

PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications

投稿日: 2025年6月13日作成者: jarxiv

要約

多様な環境とドメインでの堅牢なナビゲーションには、正確な状態推定と透明な意思決定の両方が必要です。
PhysNav-DGは、古典的なセンサーの融合をビジョン言語モデルのセマンティックパワーと統合する新しいフレームワークです。
デュアルブランチアーキテクチャは、マルチセンサー入力からのナビゲーションアクションを予測し、同時に詳細な考え方の説明を生成します。
修正された適応型カルマンフィルターは、環境コンテキストに基づいてノイズパラメーターを動的に調整します。
Llama 3.2 11bやBlip-2などのモデルからのセマンティックな洞察とともに、生センサーデータのいくつかのストリームを活用します。
アプローチを評価するために、屋内ナビゲーション、自律運転、および地上の真実のアクションと人間の検証の説明を備えた社会的ナビゲーションタスクを統一する新しいマルチドメインデータセットであるMD-Nexベンチマークを紹介します。
広範な実験とアブレーションは、PhysNAV-DGがナビゲーションの成功率を20％以上改善し、高効率を達成することを示しており、説明は非常に根拠があり、明確な説明があります。
この作業は、より安全で信頼できる自律システムのために、高レベルのセマンティック推論と幾何学的計画をつなぎます。

要約(オリジナル)

Robust navigation in diverse environments and domains requires both accurate state estimation and transparent decision making. We present PhysNav-DG, a novel framework that integrates classical sensor fusion with the semantic power of vision-language models. Our dual-branch architecture predicts navigation actions from multi-sensor inputs while simultaneously generating detailed chain-of-thought explanations. A modified Adaptive Kalman Filter dynamically adjusts its noise parameters based on environmental context. It leverages several streams of raw sensor data along with semantic insights from models such as LLaMA 3.2 11B and BLIP-2. To evaluate our approach, we introduce the MD-NEX Benchmark, a novel multi-domain dataset that unifies indoor navigation, autonomous driving, and social navigation tasks with ground-truth actions and human-validated explanations. Extensive experiments and ablations show that PhysNav-DG improves navigation success rates by over 20% and achieves high efficiency, with explanations that are both highly grounded and clear. This work connects high-level semantic reasoning and geometric planning for safer and more trustworthy autonomous systems.

arxiv情報

著者	Trisanth Srinivasan,Santosh Patapati
発行日	2025-06-12 05:18:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CV, cs.LG, cs.MM, cs.RO | コメントを受け付けていません

Demonstrating Multi-Suction Item Picking at Scale via Multi-Modal Learning of Pick Success

投稿日: 2025年6月13日作成者: jarxiv

要約

この作業は、産業規模での展開されたエンジニアリングソリューションのまばらに覆われた実世界のデータからロボット操作の側面を自律的に学習することで、パフォーマンスの向上を達成するソリューションをどのように提供できるかを示しています。
具体的には、マルチサクションロボットのピッキングに焦点を当て、候補ロボットピックの成功を予測するためのマルチモーダルビジュアルエンコーダーの適用に関する包括的な研究を実行します。
構造化されていない山から多様なアイテムを選ぶことは、倉庫などの実際の設定でのロボット操作にとって重要で挑戦的なタスクです。
クラッターからピッキングする方法は、オープンなアイテムのセットで動作する必要があり、同時に高度なスループットを実現するためのレイテンシの制約を満たす必要があります。
実証されたアプローチでは、RGB、深さ、セマンティックセグメンテーションなどの複数の入力モダリティを利用して、候補のマルチサクションピックの品質を推定します。
この戦略は、マルチモーダルのプレレインとFinetuneの組み合わせで、現実世界のアイテムを選ぶデータからトレーニングされています。
原稿は、大規模なアイテムピッキングデータセット、部分閉塞を含むことをターゲットにしたアイテムピッキングデータセット、およびパッケージピッキングデータセットで、パッケージピッキングデータセットを使用して、パッケージピッキングデータセットを提供します。
評価は、さまざまなアイテム構成、選択シーン、オブジェクトタイプのパフォーマンスを測定します。
アブレーションは、ドメイン内の事前トレーニングの影響、異なるモダリティの影響、および微調整の重要性を理解するのに役立ちます。
これらのアブレーションは、複数のモダリティにわたるトレーニングの重要性の両方を明らかにしているだけでなく、モダリティ間の関係を前提とするモデルが学習する能力も明らかにしているため、微調整と推論中に入力として使用できるサブセットのみが使用されます。

要約(オリジナル)

This work demonstrates how autonomously learning aspects of robotic operation from sparsely-labeled, real-world data of deployed, engineered solutions at industrial scale can provide with solutions that achieve improved performance. Specifically, it focuses on multi-suction robot picking and performs a comprehensive study on the application of multi-modal visual encoders for predicting the success of candidate robotic picks. Picking diverse items from unstructured piles is an important and challenging task for robot manipulation in real-world settings, such as warehouses. Methods for picking from clutter must work for an open set of items while simultaneously meeting latency constraints to achieve high throughput. The demonstrated approach utilizes multiple input modalities, such as RGB, depth and semantic segmentation, to estimate the quality of candidate multi-suction picks. The strategy is trained from real-world item picking data, with a combination of multimodal pretrain and finetune. The manuscript provides comprehensive experimental evaluation performed over a large item-picking dataset, an item-picking dataset targeted to include partial occlusions, and a package-picking dataset, which focuses on containers, such as boxes and envelopes, instead of unpackaged items. The evaluation measures performance for different item configurations, pick scenes, and object types. Ablations help to understand the effects of in-domain pretraining, the impact of different modalities and the importance of finetuning. These ablations reveal both the importance of training over multiple modalities but also the ability of models to learn during pretraining the relationship between modalities so that during finetuning and inference, only a subset of them can be used as input.

arxiv情報

著者	Che Wang,Jeroen van Baar,Chaitanya Mitash,Shuai Li,Dylan Randle,Weiyao Wang,Sumedh Sontakke,Kostas E. Bekris,Kapil Katyal
発行日	2025-06-12 05:35:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, cs.RO | コメントを受け付けていません

Towards more efficient quantitative safety validation of residual risk for assisted and automated driving

投稿日: 2025年6月13日作成者: jarxiv

要約

Advanced Driver Assistance Systems（ADAS）および自動化された運転システム（ADS）の安全検証は、ISO 21448などの国際基準を順守しながら、残留リスクを定量化する効率的かつ信頼できる方法をますます要求しています。
FOTを使用すると、特により高い自動化レベルでは、非現実的なテスト努力が行われます。
自動化レベルの低下でさえ、この制限は、FOTに関連する実質的なコストと相まって、FOTベースの巨視的安全性検証の効率を高めるためのアプローチの調査を動機づけます。
したがって、この出版物は、文献で報告されている新しい方法を含め、FOTの最先端の削減アプローチ（RA）を体系的に識別および評価します。
ISO 21448の分析に基づいて、2つのモデルが導き出されます。標準の引数コンポーネントをキャプチャする一般的なモデルと、自動緊急ブレーキ（AEB）システムに例的に適用され、残留リスク（QSVRR）の定量的安全性検証のための実際の運転要件のベースラインを確立します。
その後、RAは4つの基準を使用して評価されます：定量化、妥当性への脅威、リンクの欠落、ブラックボックスの互換性、潜在的な利点の強調、固有の制限、さらなる研究のための重要な領域の特定。
私たちの評価は、いくつかのアプローチが可能性を提供するが、リンクを欠落しているものや他の実質的な欠点がないものはないことを明らかにしています。
さらに、特定された代替手段はFOTを完全に置き換えることはできず、ADAとADSの安全検証におけるその重要な役割を反映しています。

要約(オリジナル)

The safety validation of Advanced Driver Assistance Systems (ADAS) and Automated Driving Systems (ADS) increasingly demands efficient and reliable methods to quantify residual risk while adhering to international standards such as ISO 21448. Traditionally, Field Operational Testing (FOT) has been pivotal for macroscopic safety validation of automotive driving functions up to SAE automation level 2. However, state-of-the-art derivations for empirical safety demonstrations using FOT often result in impractical testing efforts, particularly at higher automation levels. Even at lower automation levels, this limitation – coupled with the substantial costs associated with FOT – motivates the exploration of approaches to enhance the efficiency of FOT-based macroscopic safety validation. Therefore, this publication systematically identifies and evaluates state-of-the-art Reduction Approaches (RAs) for FOT, including novel methods reported in the literature. Based on an analysis of ISO 21448, two models are derived: a generic model capturing the argumentation components of the standard, and a base model, exemplarily applied to Automatic Emergency Braking (AEB) systems, establishing a baseline for the real-world driving requirement for a Quantitative Safety Validation of Residual Risk (QSVRR). Subsequently, the RAs are assessed using four criteria: quantifiability, threats to validity, missing links, and black box compatibility, highlighting potential benefits, inherent limitations, and identifying key areas for further research. Our evaluation reveals that, while several approaches offer potential, none are free from missing links or other substantial shortcomings. Moreover, no identified alternative can fully replace FOT, reflecting its crucial role in the safety validation of ADAS and ADS.

arxiv情報

著者	Daniel Betschinske,Malte Schrimpf,Steven Peters,Kamil Klonecki,Jan Peter Karch,Moritz Lippert
発行日	2025-06-12 05:41:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.RO | コメントを受け付けていません

Simultaneous Localization and Affordance Prediction of Tasks from Egocentric Video

投稿日: 2025年6月13日作成者: jarxiv

要約

Vision-Language Models（VLM）は、さまざまなドメインでの下流の視力および自然言語アプリケーションの基礎モデルとして大きな成功を示しています。
ただし、これらのモデルは、イメージプレーンに現在表示されているオブジェクトやアクションをめぐる推論に限定されています。
VLMに空間的拡張を提示します。これは、空間的に局所的なエゴセントリックビデオデモンストレーションを活用して、空間的タスクフォーダンスを理解することにより、つまり、エージェントが物理的に行われるためには、エージェントの視聴者に関連するタスクの局在化を理解することにより、2つの方法でVLMを増強します。
アプローチは、VLMを使用して、ロケーションタグ付き画像のセットにタスクの説明の類似性をマッピングするベースラインを上回ることを示しています。
私たちのアプローチは、タスクがどこで行われるかを予測することと、現在の場所でどのタスクが発生する可能性があるかを予測することで、エラーが少なくなります。
結果として生じる表現により、ロボットは自己中心的なセンシングを使用して、自然言語で指定された新しいタスクに対して関心のある物理的領域をナビゲートすることができます。

要約(オリジナル)

Vision-Language Models (VLMs) have shown great success as foundational models for downstream vision and natural language applications in a variety of domains. However, these models are limited to reasoning over objects and actions currently visible on the image plane. We present a spatial extension to the VLM, which leverages spatially-localized egocentric video demonstrations to augment VLMs in two ways — through understanding spatial task-affordances, i.e. where an agent must be for the task to physically take place, and the localization of that task relative to the egocentric viewer. We show our approach outperforms the baseline of using a VLM to map similarity of a task’s description over a set of location-tagged images. Our approach has less error both on predicting where a task may take place and on predicting what tasks are likely to happen at the current location. The resulting representation will enable robots to use egocentric sensing to navigate to, or around, physical regions of interest for novel tasks specified in natural language.

arxiv情報

著者	Zachary Chavis,Hyun Soo Park,Stephen J. Guy
発行日	2025-06-12 05:52:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

EAST: Environment Aware Safe Tracking using Planning and Control Co-Design

投稿日: 2025年6月13日作成者: jarxiv

要約

このペーパーでは、障害物が動いている未知の環境での自律モバイルロボットナビゲーションの問題を検討します。
パス計画の障害物クリアランスコスト、ロボットモーション予測のための凸状の到達可能なセット、動的障害物回避の安全上の制約を統合するロボットモーションプランの環境認識安全追跡（東）を達成するための新しい方法を提案します。
東部は、局所的に感知された環境の形状とダイナミクスに従ってロボットの動きを適応させ、広いオープンエリアでの速い動きと、狭い通路や移動障害物での慎重な行動につながります。
当社のコントロール設計では、ロボットの動きを導き、パストラッキングと安全性の目的を切り離す仮想動的システムであるリファレンスガバナーを使用しています。
参照ガバナーの方法は静的環境での安全な追跡制御に使用されていますが、私たちの重要な貢献は、制御バリア関数（CBF）制約を伴う凸最適化を使用した動的環境への拡張です。
したがって、私たちの仕事は、動的環境での安全な制御のための参照ガバナーのテクニックとCBFテクニックの間の関係を確立します。
複雑な障害物の構成と自然な動的障害物の動きを特徴とする、シミュレートされた現実世界の環境でのアプローチを検証します。

要約(オリジナル)

This paper considers the problem of autonomous mobile robot navigation in unknown environments with moving obstacles. We propose a new method to achieve environment-aware safe tracking (EAST) of robot motion plans that integrates an obstacle clearance cost for path planning, a convex reachable set for robot motion prediction, and safety constraints for dynamic obstacle avoidance. EAST adapts the motion of the robot according to the locally sensed environment geometry and dynamics, leading to fast motion in wide open areas and cautious behavior in narrow passages or near moving obstacles. Our control design uses a reference governor, a virtual dynamical system that guides the robot’s motion and decouples the path tracking and safety objectives. While reference governor methods have been used for safe tracking control in static environments, our key contribution is an extension to dynamic environments using convex optimization with control barrier function (CBF) constraints. Thus, our work establishes a connection between reference governor techniques and CBF techniques for safe control in dynamic environments. We validate our approach in simulated and real-world environments, featuring complex obstacle configurations and natural dynamic obstacle motion.

arxiv情報

著者	Zhichao Li,Yinzhuang Yi,Zhuolin Niu,Nikolay Atanasov
発行日	2025-06-12 05:55:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.RO, cs.SY, eess.SY | コメントを受け付けていません

RICE: Reactive Interaction Controller for Cluttered Canopy Environment

投稿日: 2025年6月13日作成者: jarxiv

要約

農業用カノピーなどの密集した乱雑な環境でのロボットナビゲーションは、葉や枝によって引き起こされる物理的および視覚的閉塞のために大きな課題を提示します。
従来のビジョンベースまたはモデル依存のアプローチは、これらの設定でしばしば失敗します。これらの設定では、葉と枝を損傷することなく物理的な相互作用がターゲットに到達するために必要です。
エンドエフェクターの位置とリアルタイムの触覚フィードバックを使用して、コンタクトが豊富で乱雑な、変形可能な環境でロボットアームの安全なナビゲーションを可能にする新しいリアクティブコントローラーを提示します。
提案されたフレームワークの相互作用戦略は、障害を操作することで妨害を最小限に抑え、それらを通り抜けてターゲットに向かって移動することとのトレードオフに基づいています。
閉塞されたターゲットを備えた3つの実験プラントセットアップで35を超える試行が、提案されたコントローラーがブランチを破ることなくすべての試験でターゲットに正常に到達し、最先端のモデルのないコントローラーを堅牢性と適応性を上回ったことを示しています。
この作業は、散らばった接触豊富な変形可能な環境における安全で適応的な相互作用の基礎を築き、植物キャノピーでの剪定や収穫などの将来の農業タスクを可能にします。

要約(オリジナル)

Robotic navigation in dense, cluttered environments such as agricultural canopies presents significant challenges due to physical and visual occlusion caused by leaves and branches. Traditional vision-based or model-dependent approaches often fail in these settings, where physical interaction without damaging foliage and branches is necessary to reach a target. We present a novel reactive controller that enables safe navigation for a robotic arm in a contact-rich, cluttered, deformable environment using end-effector position and real-time tactile feedback. Our proposed framework’s interaction strategy is based on a trade-off between minimizing disturbance by maneuvering around obstacles and pushing through them to move towards the target. We show that over 35 trials in 3 experimental plant setups with an occluded target, the proposed controller successfully reached the target in all trials without breaking any branch and outperformed the state-of-the-art model-free controller in robustness and adaptability. This work lays the foundation for safe, adaptive interaction in cluttered, contact-rich deformable environments, enabling future agricultural tasks such as pruning and harvesting in plant canopies.

arxiv情報

著者	Nidhi Homey Parayil,Thierry Peynot,Chris Lehnert
発行日	2025-06-12 06:19:22+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.RO, cs.SY, eess.SY | コメントを受け付けていません

AgentThink: A Unified Framework for Tool-Augmented Chain-of-Thought Reasoning in Vision-Language Models for Autonomous Driving

投稿日: 2025年6月13日作成者: jarxiv

要約

ビジョン言語モデル（VLM）は、自律運転の約束を示していますが、幻覚との闘い、非効率的な推論、および限られた実世界の検証は、正確な知覚と堅牢な段階的な推論を妨げます。
これを克服するために、自律運転タスクの動的なエージェントスタイルのツールの呼び出しと初めて統合（COT）推論を初めて統合する先駆的な統一フレームワークであるAgentHinkを紹介します。
AgentThinkのコアイノベーションには、次のものが含まれます。（i）構造化されたデータ生成が含まれます。自動運転ツールライブラリを確立して、多様な運転シナリオのためのツール使用を明示的に組み込んだ構造化された自己検証の推論データを自動的に構築すること。
（ii）自律的なツールの呼び出しの機能をVLMに装備するために、グループ相対ポリシー最適化（GRPO）を備えた監視付き微調整（SFT）を使用した2段階のトレーニングパイプライン。
（iii）エージェントスタイルのツール使用評価。モデルのツールの呼び出しと利用を厳密に評価するための新しいマルチツール評価プロトコルを導入します。
DrivelMM-O1ベンチマークでの実験により、AgentHinkが全体的な推論スコアを53.91％増加させ、回答の精度を33.54％増加させ、推論の質と一貫性を著しく改善します。
さらに、さまざまなベンチマークにわたるアブレーション研究と堅牢なゼロショット/少数のショット一般化実験は、その強力な機能を強調しています。
これらの調査結果は、信頼できるツールを意識する自律運転モデルを開発するための有望な軌跡を強調しています。

要約(オリジナル)

Vision-Language Models (VLMs) show promise for autonomous driving, yet their struggle with hallucinations, inefficient reasoning, and limited real-world validation hinders accurate perception and robust step-by-step reasoning. To overcome this, we introduce AgentThink, a pioneering unified framework that, for the first time, integrates Chain-of-Thought (CoT) reasoning with dynamic, agent-style tool invocation for autonomous driving tasks. AgentThink’s core innovations include: (i) Structured Data Generation, by establishing an autonomous driving tool library to automatically construct structured, self-verified reasoning data explicitly incorporating tool usage for diverse driving scenarios; (ii) A Two-stage Training Pipeline, employing Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO) to equip VLMs with the capability for autonomous tool invocation; and (iii) Agent-style Tool-Usage Evaluation, introducing a novel multi-tool assessment protocol to rigorously evaluate the model’s tool invocation and utilization. Experiments on the DriveLMM-o1 benchmark demonstrate AgentThink significantly boosts overall reasoning scores by 53.91% and enhances answer accuracy by 33.54%, while markedly improving reasoning quality and consistency. Furthermore, ablation studies and robust zero-shot/few-shot generalization experiments across various benchmarks underscore its powerful capabilities. These findings highlight a promising trajectory for developing trustworthy and tool-aware autonomous driving models.

arxiv情報

著者	Kangan Qian,Sicong Jiang,Yang Zhong,Ziang Luo,Zilin Huang,Tianze Zhu,Kun Jiang,Mengmeng Yang,Zheng Fu,Jinyu Miao,Yining Shi,He Zhe Lim,Li Liu,Tianbao Zhou,Huang Yu,Yifei Hu,Guang Li,Guang Chen,Hao Ye,Lijun Sun,Diange Yang
発行日	2025-06-12 06:27:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL, cs.CV, cs.RO | コメントを受け付けていません

An energy-efficient learning solution for the Agile Earth Observation Satellite Scheduling Problem

投稿日: 2025年6月13日作成者: jarxiv

要約

アジャイル地球観測衛星スケジューリング問題（AEOSSP）は、時間、エネルギー、記憶の運用上の制約を満たしながら、衛星の軌道に沿ってスケジュールされる観測目標のサブセットを見つけることを伴います。
何を観察するかを決定する問題は本質的に複雑であり、クラウドオクルージョン、大気乱流、画像解像度など、キャプチャされた画像の品質を損なういくつかの問題を考慮すると、さらに挑戦的になります。
このペーパーでは、AEOSSPに時間依存の利益をもたらし、これら3つの要因を統合してエネルギーとメモリリソースの使用を最適化するための深い強化学習（DRL）アプローチを提示します。
提案された方法には、二重の意思決定プロセスが含まれます。ターゲットのシーケンスを選択し、それぞれの最適な観測時間を決定します。
私たちの結果は、提案されたアルゴリズムが品質要件を60％> 60％満たすことができない画像のキャプチャを減らし、その結果、強力な観測パフォーマンスを維持しながら、態度廃棄物を最大78％減らすことを示しています。

要約(オリジナル)

The Agile Earth Observation Satellite Scheduling Problem (AEOSSP) entails finding the subset of observation targets to be scheduled along the satellite’s orbit while meeting operational constraints of time, energy and memory. The problem of deciding what and when to observe is inherently complex, and becomes even more challenging when considering several issues that compromise the quality of the captured images, such as cloud occlusion, atmospheric turbulence, and image resolution. This paper presents a Deep Reinforcement Learning (DRL) approach for addressing the AEOSSP with time-dependent profits, integrating these three factors to optimize the use of energy and memory resources. The proposed method involves a dual decision-making process: selecting the sequence of targets and determining the optimal observation time for each. Our results demonstrate that the proposed algorithm reduces the capture of images that fail to meet quality requirements by > 60% and consequently decreases energy waste from attitude maneuvers by up to 78%, all while maintaining strong observation performance.

arxiv情報

著者	Antonio M. Mercado-Martínez,Beatriz Soret,Antonio Jurado-Navas
発行日	2025-06-12 07:00:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG, cs.RO | コメントを受け付けていません

Safety-Ensured Robotic Control Framework for Cutting Task Automation in Endoscopic Submucosal Dissection

投稿日: 2025年6月13日作成者: jarxiv

要約

胃腸癌（GI）がんの治療のための内視鏡検査など、ロボットシステムを使用して手術タスクの自動化に関心が高まっています。
ただし、以前の研究では、主にオブジェクトまたはロボットの検出と分析に焦点を当てており、安全性を確保することに注意が払われています。これは、事故が安全でないロボットモーションによって引き起こされる可能性がある臨床用途にとって重要です。
この研究では、内視鏡ロボットを使用して、初期消化管癌の治療のための代表的な内視鏡外科手術である内視鏡下粘膜分離（ESD）の切断タスクの自動化の安全性を正式に保証できる新しい制御フレームワークを提案します。
提案されたフレームワークは、GI管内の近接性であっても、コントロールバリア関数（CBFS）を利用して個々の腫瘍の境界を正確に識別し、周囲の正常組織を保存しながら正確な治療と除去を確保します。
さらに、モデルのない制御スキームを採用することにより、動的モデリングが困難な内視鏡ロボットシステムでも安全保証が可能になります。
シミュレーションベースの実験環境で提案されたフレームワークを実証します。そこでは、除去される腫瘍が互いに近くにあることを示し、安全上の制約が施行されていることを示します。
モデルのないCBFベースの制御ロボットは、近くの腫瘍に侵入していない一方で、損傷することなく1つの腫瘍を完全に排除することを示します。

要約(オリジナル)

There is growing interest in automating surgical tasks using robotic systems, such as endoscopy for treating gastrointestinal (GI) cancer. However, previous studies have primarily focused on detecting and analyzing objects or robots, with limited attention to ensuring safety, which is critical for clinical applications, where accidents can be caused by unsafe robot motions. In this study, we propose a new control framework that can formally ensure the safety of automating the cutting task in endoscopic submucosal dissection (ESD), a representative endoscopic surgical method for the treatment of early GI cancer, by using an endoscopic robot. The proposed framework utilizes Control Barrier Functions (CBFs) to accurately identify the boundaries of individual tumors, even in close proximity within the GI tract, ensuring precise treatment and removal while preserving the surrounding normal tissue. Additionally, by adopting a model-free control scheme, safety assurance is made possible even in endoscopic robotic systems where dynamic modeling is challenging. We demonstrate the proposed framework in a simulation-based experimental environment, where the tumors to be removed are close to each other, and show that the safety constraints are enforced. We show that the model-free CBF-based controlled robot eliminates one tumor completely without damaging it, while not invading another nearby tumor.

arxiv情報

著者	Yitaek Kim,Iñigo Iturrate,Christoffer Sloth,Hansoul Kim
発行日	2025-06-12 07:24:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.RO, cs.SY, eess.SY | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント