jarxiv | Japanese arxiv | ページ 452

Satellite Autonomous Clock Fault Monitoring with Inter-Satellite Ranges Using Euclidean Distance Matrices

投稿日: 2025年5月19日作成者: jarxiv

要約

月の環境での堅牢な位置決め、ナビゲーション、およびタイミングサービスの必要性に対処するために、このペーパーでは、デュアルな一方向の衛星リンクから得られた範囲測定を使用して、衛星星座の新しいオンボードクロックフェーズジャンプ検出フレームワークを提案します。
私たちのアプローチは、頂点の位置や時計バイアスの事前知識に依存することなく障害を検出するために頂点冗長なグラフを活用して、多様な衛星タイプと演算子を持つ月の衛星ネットワークに柔軟性を提供します。
衛星星座をグラフとしてモデル化します。グラフでは、衛星は頂点であり、衛星間リンクがエッジです。
提案されているアルゴリズムは、5クリークサブグラフの幾何学中心のユークリッド距離マトリックス（GCEDM）の特異値を監視することにより、時計ジャンプで衛星を検出および識別します。
提案された方法は、GPS星座のシミュレーションと月の周りの概念的な星座を通じて検証され、さまざまな構成におけるその有効性を示しています。

要約(オリジナル)

To address the need for robust positioning, navigation, and timing services in lunar environments, this paper proposes a novel onboard clock phase jump detection framework for satellite constellations using range measurements obtained from dual one-way inter-satellite links. Our approach leverages vertex redundantly rigid graphs to detect faults without relying on prior knowledge of satellite positions or clock biases, providing flexibility for lunar satellite networks with diverse satellite types and operators. We model satellite constellations as graphs, where satellites are vertices and inter-satellite links are edges. The proposed algorithm detects and identifies satellites with clock jumps by monitoring the singular values of the geometric-centered Euclidean distance matrix (GCEDM) of 5-clique sub-graphs. The proposed method is validated through simulations of a GPS constellation and a notional constellation around the Moon, demonstrating its effectiveness in various configurations.

arxiv情報

著者	Keidai Iiyama,Daniel Neamati,Grace Gao
発行日	2025-05-16 08:01:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.MA, cs.RO, cs.SY, eess.SY | コメントを受け付けていません

GROQLoco: Generalist and RObot-agnostic Quadruped Locomotion Control using Offline Datasets

投稿日: 2025年5月19日作成者: jarxiv

要約

大規模なオフライントレーニングの最近の進歩は、複雑なロボットタスクのジェネラリスト政策学習の可能性を実証しています。
ただし、これらの原則を脚のある移動に適用することは、継続的なダイナミクスと、多様な地形とロボットの形態にわたるリアルタイムの適応の必要性により、課題のままです。
この作業では、オフラインデータセットのみに依存して、複数の4倍のロボットと地形で単一のジェネラリストの移動ポリシーを学習するスケーラブルで注意ベースのフレームワークであるGroqlocoを提案します。
私たちのアプローチは、2つの異なる移動行動からの専門家のデモンストレーションを活用しています – 階段横断（非周期的歩行）とフラット地形トラバーサル（周期的な歩行） – 複数の四分流ロボットで収集され、両方の行動の行動融合を可能にする一般主義モデルを訓練します。
重要なことに、私たちのフレームワークは、ロボット固有のエンコーディングを組み込むことなく、すべてのロボットからの固有受容データで直接動作します。
このポリシーは、Intel I7 NUCに直接展開でき、テスト時間の最適化なしに低遅延制御出力を生成します。
当社の広範な実験は、市販の12kgのロボットであるUnitree Go1でのハードウェアの展開を含む、非常に多様な4倍のロボットと地形にわたる強力なゼロショット転送を示しています。
特に、ロボット全体に異なる移動スキルが不均一に分布する挑戦的なクロスロボットトレーニングセットアップを評価しますが、テスト時にすべてのロボットへのフラットウォーキングと階段横断挙動の両方の転送の成功を観察します。
また、微調整を必要とせずに、70kgの四足動画であるStoch 5で、平らな屋外の地形での予備的なウォーキングを示します。
これらの結果は、多様なロボットや地形にわたる堅牢なジェネラリストの移動の可能性を強調しています。

要約(オリジナル)

Recent advancements in large-scale offline training have demonstrated the potential of generalist policy learning for complex robotic tasks. However, applying these principles to legged locomotion remains a challenge due to continuous dynamics and the need for real-time adaptation across diverse terrains and robot morphologies. In this work, we propose GROQLoco, a scalable, attention-based framework that learns a single generalist locomotion policy across multiple quadruped robots and terrains, relying solely on offline datasets. Our approach leverages expert demonstrations from two distinct locomotion behaviors – stair traversal (non-periodic gaits) and flat terrain traversal (periodic gaits) – collected across multiple quadruped robots, to train a generalist model that enables behavior fusion for both behaviors. Crucially, our framework operates directly on proprioceptive data from all robots without incorporating any robot-specific encodings. The policy is directly deployable on an Intel i7 nuc, producing low-latency control outputs without any test-time optimization. Our extensive experiments demonstrate strong zero-shot transfer across highly diverse quadruped robots and terrains, including hardware deployment on the Unitree Go1, a commercially available 12kg robot. Notably, we evaluate challenging cross-robot training setups where different locomotion skills are unevenly distributed across robots, yet observe successful transfer of both flat walking and stair traversal behaviors to all robots at test time. We also show preliminary walking on Stoch 5, a 70kg quadruped, on flat and outdoor terrains without requiring any fine tuning. These results highlight the potential for robust generalist locomotion across diverse robots and terrains.

arxiv情報

著者	Narayanan PP,Sarvesh Prasanth Venkatesan,Srinivas Kantha Reddy,Shishir Kolathaya
発行日	2025-05-16 08:17:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG, cs.RO, I.2.9 | コメントを受け付けていません

RGB-Event Fusion with Self-Attention for Collision Prediction

投稿日: 2025年5月19日作成者: jarxiv

要約

動的で現実世界の環境での自律的なロボットの安全な動作には、堅牢でリアルタイムの障害物回避を確保することが重要です。
このペーパーでは、RGBとイベントベースのビジョンセンサーを使用して、動的オブジェクトを備えた無人航空機の時間と衝突位置を予測するためのニューラルネットワークフレームワークを提案します。
提案されたアーキテクチャは、各モダリティに1つずつ、2つの個別のエンコーダーブランチで構成され、次に予測精度を向上させるための自己関節による融合が続きます。
ベンチマークを容易にするために、シングルモダリティと融合ベースのアプローチの詳細な比較を可能にするABCD [8]データセットを活用します。
50Hzの同じ予測スループットで、実験結果は、融合ベースのモデルが平均で1％、0.5mを超える距離で10％のシングルモダリティアプローチで予測精度を改善することを示していますが、メモリで + 71％、フロップで + 105％のコストで提供されます。
特に、イベントベースのモデルは、RGBモデルをポジションで4％、同様の計算コストで時間エラーを26％上回るため、競争力のある代替手段になります。
さらに、イベントベースのモデルの量子化バージョンを評価し、1〜8ビットの量子化を適用して、予測パフォーマンスと計算効率の間のトレードオフを評価します。
これらの調査結果は、RGBおよびイベントベースのカメラをロボットアプリケーションで使用したマルチモーダル認識のトレードオフを強調しています。

要約(オリジナル)

Ensuring robust and real-time obstacle avoidance is critical for the safe operation of autonomous robots in dynamic, real-world environments. This paper proposes a neural network framework for predicting the time and collision position of an unmanned aerial vehicle with a dynamic object, using RGB and event-based vision sensors. The proposed architecture consists of two separate encoder branches, one for each modality, followed by fusion by self-attention to improve prediction accuracy. To facilitate benchmarking, we leverage the ABCD [8] dataset collected that enables detailed comparisons of single-modality and fusion-based approaches. At the same prediction throughput of 50Hz, the experimental results show that the fusion-based model offers an improvement in prediction accuracy over single-modality approaches of 1% on average and 10% for distances beyond 0.5m, but comes at the cost of +71% in memory and + 105% in FLOPs. Notably, the event-based model outperforms the RGB model by 4% for position and 26% for time error at a similar computational cost, making it a competitive alternative. Additionally, we evaluate quantized versions of the event-based models, applying 1- to 8-bit quantization to assess the trade-offs between predictive performance and computational efficiency. These findings highlight the trade-offs of multi-modal perception using RGB and event-based cameras in robotic applications.

arxiv情報

著者	Pietro Bonazzi,Christian Vogt,Michael Jost,Haotong Qin,Lyes Khacef,Federico Paredes-Valles,Michele Magno
発行日	2025-05-16 08:32:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable Policy

投稿日: 2025年5月19日作成者: jarxiv

要約

衣服の操作は、衣服のカテゴリ、幾何学、変形の多様性のために重大な課題です。
それにもかかわらず、私たちの手の器用さのおかげで、人間は衣服を楽に扱うことができます。
しかし、この分野での既存の研究は、このレベルの器用さを再現するのに苦労しており、主に器用な衣服操作の現実的なシミュレーションの欠如によって妨げられています。
したがって、15のタスクシナリオ用の大規模な高品質の3Dアセットを特徴とする、器用な（特に両手）衣服操作用に特別に設計された最初の環境であるDexgarmentLabを提案し、SIM-Realギャップを減らすために衣服モデリングに合わせたシミュレーション技術を改良します。
以前のデータ収集は、通常、労働集約的で非効率的なテレオ操作またはトレーニング専門家の強化学習（RL）ポリシーに依存しています。
この論文では、衣服の構造対応を活用して、単一の専門家デモのみを使用して、多様な軌跡を備えたデータセットを自動的に生成し、手動介入を大幅に削減します。
ただし、広範なデモでさえ、衣服の無限の状態をカバーすることはできません。これにより、新しいアルゴリズムの探索が必要です。
多様な衣服の形状と変形全体の一般化を改善するために、階層的な衣服操作ポリシー（HALO）を提案します。
まず、転送可能なアフォーダンスポイントを識別して、操作領域を正確に見つけ、次に一般化可能な軌跡を生成してタスクを完了します。
私たちの方法とベースラインの広範な実験と詳細な分析を通じて、Haloは既存の方法を一貫して上回ることを実証し、他の人が失敗する形状と変形の大幅なバリエーションでさえ、以前に見えなかったインスタンスに成功裏に一般化することを実証します。
プロジェクトページは、https：//wayrise.github.io/dexgarmentlab/で入手できます。

要約(オリジナル)

Garment manipulation is a critical challenge due to the diversity in garment categories, geometries, and deformations. Despite this, humans can effortlessly handle garments, thanks to the dexterity of our hands. However, existing research in the field has struggled to replicate this level of dexterity, primarily hindered by the lack of realistic simulations of dexterous garment manipulation. Therefore, we propose DexGarmentLab, the first environment specifically designed for dexterous (especially bimanual) garment manipulation, which features large-scale high-quality 3D assets for 15 task scenarios, and refines simulation techniques tailored for garment modeling to reduce the sim-to-real gap. Previous data collection typically relies on teleoperation or training expert reinforcement learning (RL) policies, which are labor-intensive and inefficient. In this paper, we leverage garment structural correspondence to automatically generate a dataset with diverse trajectories using only a single expert demonstration, significantly reducing manual intervention. However, even extensive demonstrations cannot cover the infinite states of garments, which necessitates the exploration of new algorithms. To improve generalization across diverse garment shapes and deformations, we propose a Hierarchical gArment-manipuLation pOlicy (HALO). It first identifies transferable affordance points to accurately locate the manipulation area, then generates generalizable trajectories to complete the task. Through extensive experiments and detailed analysis of our method and baseline, we demonstrate that HALO consistently outperforms existing methods, successfully generalizing to previously unseen instances even with significant variations in shape and deformation where others fail. Our project page is available at: https://wayrise.github.io/DexGarmentLab/.

arxiv情報

著者	Yuran Wang,Ruihai Wu,Yue Chen,Jiarui Wang,Jiaqi Liang,Ziyu Zhu,Haoran Geng,Jitendra Malik,Pieter Abbeel,Hao Dong
発行日	2025-05-16 09:26:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CV, cs.RO | コメントを受け付けていません

PARSEC: Preference Adaptation for Robotic Object Rearrangement from Scene Context

投稿日: 2025年5月19日作成者: jarxiv

要約

オブジェクトの再配置は、明示的な指示なしにパーソナライズを必要とする家庭用ロボットの重要なタスク、オブジェクトで占められている環境での意味のあるオブジェクトの配置、および目に見えないオブジェクトと新しい環境への一般化です。
これらの課題に対処する研究を促進するために、観測されたシーンコンテキストからユーザー組織の好みを学習するためのオブジェクト再配置ベンチマークであるParsecを導入し、部分的に配置された環境にオブジェクトを配置します。
Parsecは、93のオブジェクトカテゴリと15の環境を備えた72人のユーザーからクラウドソーシングされた110kの再配置例の新しいデータセットの上に構築されています。
また、複数の有効な配置を説明しながら、以前および現在のシーンコンテキストからユーザーの好みに合わせてオブジェクトを配置するLLMベースの再配置モデルであるContextSortLMも提案します。
ParsecベンチマークでContextsORTLMおよび既存のパーソナライズされた再配置アプローチを評価し、ユーザーの好みとのアラインメントに基づいて108人のオンライン評価者ランキングモデル予測のクラウドソーシング評価でこれらの調査結果を補完します。
私たちの結果は、複数のシーンコンテキストソースを活用するパーソナライズされた再配置モデルが、単一のコンテキストソースに依存するモデルよりもパフォーマンスが高いことを示しています。
さらに、ContextSortlmは、オンライン評価者によって評価されているように、ターゲットユーザーの配置と3つの環境カテゴリすべての上位2つのランクを複製するオブジェクトを配置する他のモデルを上回ります。
重要なことに、私たちの評価は、さまざまな環境カテゴリにわたるモデリング環境セマンティクスに関連する課題を強調し、将来の作業に関する推奨事項を提供します。

要約(オリジナル)

Object rearrangement is a key task for household robots requiring personalization without explicit instructions, meaningful object placement in environments occupied with objects, and generalization to unseen objects and new environments. To facilitate research addressing these challenges, we introduce PARSEC, an object rearrangement benchmark for learning user organizational preferences from observed scene context to place objects in a partially arranged environment. PARSEC is built upon a novel dataset of 110K rearrangement examples crowdsourced from 72 users, featuring 93 object categories and 15 environments. We also propose ContextSortLM, an LLM-based rearrangement model that places objects in partially arranged environments by adapting to user preferences from prior and current scene context while accounting for multiple valid placements. We evaluate ContextSortLM and existing personalized rearrangement approaches on the PARSEC benchmark and complement these findings with a crowdsourced evaluation of 108 online raters ranking model predictions based on alignment with user preferences. Our results indicate that personalized rearrangement models leveraging multiple scene context sources perform better than models relying on a single context source. Moreover, ContextSortLM outperforms other models in placing objects to replicate the target user’s arrangement and ranks among the top two in all three environment categories, as rated by online evaluators. Importantly, our evaluation highlights challenges associated with modeling environment semantics across different environment categories and provides recommendations for future work.

arxiv情報

著者	Kartik Ramachandruni,Sonia Chernova
発行日	2025-05-16 10:40:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.RO | コメントを受け付けていません

Planar Velocity Estimation for Fast-Moving Mobile Robots Using Event-Based Optical Flow

投稿日: 2025年5月19日作成者: jarxiv

要約

正確な速度推定は、モバイルロボット工学、特にドライバー支援システムと自律運転で重要です。
慣性測定ユニット（IMU）データと融合したホイールオドメトリーは、速度推定のために広く使用されている方法です。
ただし、通常、滑りやすい表面のようなさまざまな環境条件下では保持されない複雑な車両ダイナミクスモデルなど、強力な仮定が必要です。
地面に垂直に尖ったイベントカメラからの光学流と組み合わせて平面運動学を活用することにより、ホイール間トラクションの仮定から切り離された速度推定へのアプローチを導入します。
非同期マイクロ秒レイテンシとイベントカメラの高ダイナミックレンジにより、自律運転のための視覚ベースの知覚技術の一般的な課題であるモーションブラーに対して非常に堅牢になります。
提案された方法は、1:10スケールの自律レースプラットフォームでのフィールド内実験を通じて評価され、正確なモーションキャプチャデータと比較され、最先端のイベント-Vio-Vio-Vio方法と同等のパフォーマンスだけでなく、横方向誤差の38.3％の改善も実証します。
最大32 m/sの高速道路速度での定性的実験は、私たちのアプローチの有効性をさらに確認し、実際の展開の重要な可能性を示しています。

要約(オリジナル)

Accurate velocity estimation is critical in mobile robotics, particularly for driver assistance systems and autonomous driving. Wheel odometry fused with Inertial Measurement Unit (IMU) data is a widely used method for velocity estimation; however, it typically requires strong assumptions, such as non-slip steering, or complex vehicle dynamics models that do not hold under varying environmental conditions like slippery surfaces. We introduce an approach to velocity estimation that is decoupled from wheel-to-surface traction assumptions by leveraging planar kinematics in combination with optical flow from event cameras pointed perpendicularly at the ground. The asynchronous micro-second latency and high dynamic range of event cameras make them highly robust to motion blur, a common challenge in vision-based perception techniques for autonomous driving. The proposed method is evaluated through in-field experiments on a 1:10 scale autonomous racing platform and compared to precise motion capture data, demonstrating not only performance on par with the state-of-the-art Event-VIO method but also a 38.3 % improvement in lateral error. Qualitative experiments at highway speeds of up to 32 m/s further confirm the effectiveness of our approach, indicating significant potential for real-world deployment.

arxiv情報

著者	Liam Boyle,Jonas Kühne,Nicolas Baumann,Niklas Bastuck,Michele Magno
発行日	2025-05-16 11:00:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Conditioning Matters: Training Diffusion Policies is Faster Than You Think

投稿日: 2025年5月19日作成者: jarxiv

要約

拡散ポリシーは、ビジョン言語アクション（VLA）モデルを構築するための主流のパラダイムとして浮上しています。
彼らは強力なロボット制御能力を示していますが、トレーニング効率は最適ではありません。
この作業では、条件付き拡散ポリシートレーニングにおける基本的な課題を特定します。生成条件を区別が困難な場合、トレーニングの目的は、崩壊と呼ばれる現象である限界行動分布のモデリングに退化します。
これを克服するために、CoCosを提案します。これは、条件付きフローの一致のソース分布を条件依存性に変更するシンプルでありながら一般的なソリューションです。
条件入力から抽出されたセマンティクスの周りにソース分布を固定することにより、CoCosは条件の統合が強くなり、損失の崩壊を防ぎます。
シミュレーションと現実世界のベンチマーク全体にわたって、理論的正当化と広範な経験的結果を提供します。
私たちの方法は、既存のアプローチよりも速い収束と成功率が高くなり、大規模な事前訓練を受けたVLAのパフォーマンスと一致し、勾配ステップとパラメーターが大幅に少なくなります。
COCOSは軽量で、実装が簡単で、多様なポリシーアーキテクチャと互換性があり、拡散ポリシートレーニングに汎用改善を提供します。

要約(オリジナル)

Diffusion policies have emerged as a mainstream paradigm for building vision-language-action (VLA) models. Although they demonstrate strong robot control capabilities, their training efficiency remains suboptimal. In this work, we identify a fundamental challenge in conditional diffusion policy training: when generative conditions are hard to distinguish, the training objective degenerates into modeling the marginal action distribution, a phenomenon we term loss collapse. To overcome this, we propose Cocos, a simple yet general solution that modifies the source distribution in the conditional flow matching to be condition-dependent. By anchoring the source distribution around semantics extracted from condition inputs, Cocos encourages stronger condition integration and prevents the loss collapse. We provide theoretical justification and extensive empirical results across simulation and real-world benchmarks. Our method achieves faster convergence and higher success rates than existing approaches, matching the performance of large-scale pre-trained VLAs using significantly fewer gradient steps and parameters. Cocos is lightweight, easy to implement, and compatible with diverse policy architectures, offering a general-purpose improvement to diffusion policy training.

arxiv情報

著者	Zibin Dong,Yicheng Liu,Yinchuan Li,Hang Zhao,Jianye Hao
発行日	2025-05-16 11:14:22+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.RO | コメントを受け付けていません

Reinforcement Learning for AMR Charging Decisions: The Impact of Reward and Action Space Design

投稿日: 2025年5月19日作成者: jarxiv

要約

大規模なブロックスタッキングウェアハウスの自律モバイルロボットの充電戦略を最適化するために、新しい強化学習（RL）設計を提案します。
RL設計には、主に長い実験によってのみ評価できる幅広い選択肢が含まれます。
私たちの研究は、柔軟なセットアップからよりガイド付きドメインに基づいた設計構成まで、エージェントのパフォーマンスに影響を与える範囲で、さまざまな報酬とアクションスペースの構成に焦点を当てています。
ヒューリスティック充電戦略をベースラインとして使用して、サービス時間の観点から柔軟なRLベースのアプローチの優位性を実証します。
さらに、我々の調査結果はトレードオフを強調しています。よりオープンエンドの設計では、よりパフォーマンスの高い戦略を独自に発見することができますが、誘導構成はより安定した学習プロセスにつながりますが、より限られた一般化の可能性を示します。
私たちの貢献は3つあります。
まず、充電戦略に対応するために、オープンソースのRL互換シミュレーションフレームワークであるSlapstackを拡張します。
第二に、充電戦略の問題に取り組むための新しいRLデザインを紹介します。
最後に、いくつかの新しい適応ベースラインヒューリスティックを導入し、近位ポリシー最適化エージェントを使用して設計を再現でき、さまざまな設計構成を使用して、報酬に焦点を当てています。

要約(オリジナル)

We propose a novel reinforcement learning (RL) design to optimize the charging strategy for autonomous mobile robots in large-scale block stacking warehouses. RL design involves a wide array of choices that can mostly only be evaluated through lengthy experimentation. Our study focuses on how different reward and action space configurations, ranging from flexible setups to more guided, domain-informed design configurations, affect the agent performance. Using heuristic charging strategies as a baseline, we demonstrate the superiority of flexible, RL-based approaches in terms of service times. Furthermore, our findings highlight a trade-off: While more open-ended designs are able to discover well-performing strategies on their own, they may require longer convergence times and are less stable, whereas guided configurations lead to a more stable learning process but display a more limited generalization potential. Our contributions are threefold. First, we extend SLAPStack, an open-source, RL-compatible simulation-framework to accommodate charging strategies. Second, we introduce a novel RL design for tackling the charging strategy problem. Finally, we introduce several novel adaptive baseline heuristics and reproducibly evaluate the design using a Proximal Policy Optimization agent and varying different design configurations, with a focus on reward.

arxiv情報

著者	Janik Bischoff,Alexandru Rinciog,Anne Meyer
発行日	2025-05-16 11:33:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.RO | コメントを受け付けていません

Open-Source Multi-Viewpoint Surgical Telerobotics

投稿日: 2025年5月19日作成者: jarxiv

要約

低侵襲手術のロボット（MIS）は徐々にアクセスしやすくモジュール式になるため、設立以来外科的遠隔術を特徴付ける視覚化と制御パラダイムを再考し、拡大する絶好の機会があると考えています。
腹腔内に1つ以上の追加の調整可能な視点を導入すると、外科医の新しい視覚化とコラボレーション戦略のロックを解除するだけでなく、共有された自律性に対する機械知覚の堅牢性を大幅に高めると推測します。
即時の利点には、2番目の視点の制御と異なる視点からのテレオパの外科的ツールの制御が含まれます。これにより、協力する外科医は自分の意見を独立して調整し、ロボット機器を直感的に操作できるようになります。
さらに、患者の解剖学の同期されたマルチビュー3D測定をキャプチャすると、高度なシーン表現のロックが解除されると考えています。
正確なリアルタイム術中3D認識により、アルゴリズムアシスタントは1つ以上のロボット機器および/またはロボットカメラを直接制御できます。
これらの目標に向けて、高性能ビジョンコンポーネントを統合し、Da Vinci Research Kit Controlロジックをアップグレードすることにより、同期されたマルチビューポイント、マルチセンサーロボット手術システムを構築しています。
この短い論文は、私たちのセットアップの機能的な要約を報告し、研究と将来の臨床診療における潜在的な影響について詳しく説明しています。
システムを完全にオープンすることにより、研究コミュニティがセットアップを再現し、それを改善し、強力なアルゴリズムを開発し、最先端の研究の臨床翻訳を効果的に高めることができます。

要約(オリジナル)

As robots for minimally invasive surgery (MIS) gradually become more accessible and modular, we believe there is a great opportunity to rethink and expand the visualization and control paradigms that have characterized surgical teleoperation since its inception. We conjecture that introducing one or more additional adjustable viewpoints in the abdominal cavity would not only unlock novel visualization and collaboration strategies for surgeons but also substantially boost the robustness of machine perception toward shared autonomy. Immediate advantages include controlling a second viewpoint and teleoperating surgical tools from a different perspective, which would allow collaborating surgeons to adjust their views independently and still maneuver their robotic instruments intuitively. Furthermore, we believe that capturing synchronized multi-view 3D measurements of the patient’s anatomy would unlock advanced scene representations. Accurate real-time intraoperative 3D perception will allow algorithmic assistants to directly control one or more robotic instruments and/or robotic cameras. Toward these goals, we are building a synchronized multi-viewpoint, multi-sensor robotic surgery system by integrating high-performance vision components and upgrading the da Vinci Research Kit control logic. This short paper reports a functional summary of our setup and elaborates on its potential impacts in research and future clinical practice. By fully open-sourcing our system, we will enable the research community to reproduce our setup, improve it, and develop powerful algorithms, effectively boosting clinical translation of cutting-edge research.

arxiv情報

著者	Guido Caccianiga,Yarden Sharon,Bernard Javot,Senya Polikovsky,Gökce Ergün,Ivan Capobianco,André L. Mihaljevic,Anton Deguet,Katherine J. Kuchenbecker
発行日	2025-05-16 11:41:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

X2C: A Dataset Featuring Nuanced Facial Expressions for Realistic Humanoid Imitation

投稿日: 2025年5月19日作成者: jarxiv

要約

現実的な表情を模倣する能力は、感情的な人間とロボットのコミュニケーションに従事するヒューマノイドロボットにとって不可欠です。
ただし、適切な注釈を備えた多様なヒューマノイド表情を含むデータセットの欠如は、現実的なヒューマノイド表現の模倣の進歩を妨げます。
これらの課題に対処するために、現実的なヒューマノイド模倣のための微妙な表情を特徴とするデータセットであるX2C（何でもコントロール）を紹介します。
X2Cを使用すると、1）100,000（画像、コントロール値）ペアで構成される高品質の高品質の大規模なデータセット。
各画像は、地下の真実の表現構成を表す30の制御値が注釈が付けられた、多様な表情を表示するヒューマノイドロボットを示しています。
2）X2CNET、微妙なヒューマノイド表現とX2Cからの根本的な制御値との対応を学習する新規のヒトからヒューマイドの表情模倣フレームワーク。
さまざまな人間のパフォーマーに野生の表情模倣を可能にし、模倣タスクのベースラインを提供し、データセットの潜在的な値を紹介します。
3）物理的なヒューマノイドロボットに関する現実世界のデモンストレーション。現実的なヒューマノイド表情模倣を進める能力を強調しています。
コードとデータ：https：//lipzh5.github.io/x2cnet/

要約(オリジナル)

The ability to imitate realistic facial expressions is essential for humanoid robots engaged in affective human-robot communication. However, the lack of datasets containing diverse humanoid facial expressions with proper annotations hinders progress in realistic humanoid facial expression imitation. To address these challenges, we introduce X2C (Anything to Control), a dataset featuring nuanced facial expressions for realistic humanoid imitation. With X2C, we contribute: 1) a high-quality, high-diversity, large-scale dataset comprising 100,000 (image, control value) pairs. Each image depicts a humanoid robot displaying a diverse range of facial expressions, annotated with 30 control values representing the ground-truth expression configuration; 2) X2CNet, a novel human-to-humanoid facial expression imitation framework that learns the correspondence between nuanced humanoid expressions and their underlying control values from X2C. It enables facial expression imitation in the wild for different human performers, providing a baseline for the imitation task, showcasing the potential value of our dataset; 3) real-world demonstrations on a physical humanoid robot, highlighting its capability to advance realistic humanoid facial expression imitation. Code and Data: https://lipzh5.github.io/X2CNet/

arxiv情報

著者	Peizhen Li,Longbing Cao,Xiao-Ming Wu,Runze Yang,Xiaohan Yu
発行日	2025-05-16 11:48:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.HC, cs.RO | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント