jarxiv | Japanese arxiv | ページ 1224

BEAC: Imitating Complex Exploration and Task-oriented Behaviors for Invisible Object Nonprehensile Manipulation

投稿日: 2025年3月24日作成者: jarxiv

要約

Imitation Learning（IL）を適用することは、埋もれた岩の掘削などの部分的な観測を備えた目に見えないオブジェクトの非緩和操作タスクに挑戦します。
デモンストレーターは、そのような複雑なアクション決定を行い、オブジェクトとタスク指向のアクションを見つけて、隠された状態を推定しながらタスクを完了し、おそらく一貫性のないアクションデモンストレーションと高い認知負荷問題を引き起こす必要があります。
これらの問題については、人間の認知科学における研究は、デモ隊のための事前に設計された単純な探索規則の使用を促進することで、行動の矛盾と高い認知負荷の問題を軽減する可能性があることを示唆しています。
したがって、このような探索ルールを使用してデモンストレーションから模倣学習を実行する場合、デモ隊のタスク指向の動作だけでなく、部分的な観察下でのモードスイッチング動作（探索的またはタスク指向の動作）を正確に模倣することが重要です。
上記の考慮事項に基づいて、このペーパーでは、事前に設計された探索ポリシーと過去の歴史に基づいて推定された信念状態に基づいて訓練されたタスク指向のアクションポリシーとの間の切り替えポリシー構造を持つ信念探査アクションクローニング（BEAC）と呼ばれる新しい模倣学習フレームワークを提案します。
シミュレーションおよび実際のロボット実験では、提案された方法が、ユーザー調査で示されたデモンストレーションの認知負荷を減らしながら、最高のタスクパフォーマンス、より高いモード、およびアクション予測の精度を達成したことを確認しました。

要約(オリジナル)

Applying imitation learning (IL) is challenging to nonprehensile manipulation tasks of invisible objects with partial observations, such as excavating buried rocks. The demonstrator must make such complex action decisions as exploring to find the object and task-oriented actions to complete the task while estimating its hidden state, perhaps causing inconsistent action demonstration and high cognitive load problems. For these problems, work in human cognitive science suggests that promoting the use of pre-designed, simple exploration rules for the demonstrator may alleviate the problems of action inconsistency and high cognitive load. Therefore, when performing imitation learning from demonstrations using such exploration rules, it is important to accurately imitate not only the demonstrator’s task-oriented behavior but also his/her mode-switching behavior (exploratory or task-oriented behavior) under partial observation. Based on the above considerations, this paper proposes a novel imitation learning framework called Belief Exploration-Action Cloning (BEAC), which has a switching policy structure between a pre-designed exploration policy and a task-oriented action policy trained on the estimated belief states based on past history. In simulation and real robot experiments, we confirmed that our proposed method achieved the best task performance, higher mode and action prediction accuracies, while reducing the cognitive load in the demonstration indicated by a user study.

arxiv情報

著者	Hirotaka Tahara,Takamitsu Matsubara
発行日	2025-03-21 02:26:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, cs.RO | コメントを受け付けていません

DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation

投稿日: 2025年3月24日作成者: jarxiv

要約

非構造化されていない環境では、薄すぎたり、大きく、または把握したりしないオブジェクトを処理するには、非緩和操作が重要です。
従来の計画ベースのアプローチは複雑な接触モデリングと闘っていますが、学習ベースの方法は最近、有望な代替手段として浮上しています。
ただし、既存の学習ベースのアプローチは2つの主要な制限に直面しています。マルチビューカメラと正確なポーズ追跡に大きく依存しており、オブジェクトの質量やテーブル摩擦の変化など、さまざまな物理的条件にわたって一般化できません。
これらの課題に対処するために、歴史的軌跡に基づいてダイナミクスのバリエーションに適応しながら将来の状態を共同で予測することにより、アクション学習を強化する新しいフレームワークであるダイナミクス適応ワールドアクションモデル（DYWA）を提案します。
ジオメトリ、状態、物理学、およびロボットアクションのモデリングを統合することにより、Dywaは、部分的な観察性の下でより堅牢なポリシー学習を可能にします。
ベースラインと比較して、私たちの方法は、シミュレーションでシングルビューポイントクラウド観測のみを使用して、成功率を31.5％改善します。
さらに、DYWAは、実際の実験で平均成功率68％を達成し、多様なオブジェクトのジオメトリ全体で一般化し、さまざまなテーブル摩擦に適応し、半分充填水のボトルや滑りやすい表面などの挑戦的なシナリオに堅牢性に適応する能力を実証します。

要約(オリジナル)

Nonprehensile manipulation is crucial for handling objects that are too thin, large, or otherwise ungraspable in unstructured environments. While conventional planning-based approaches struggle with complex contact modeling, learning-based methods have recently emerged as a promising alternative. However, existing learning-based approaches face two major limitations: they heavily rely on multi-view cameras and precise pose tracking, and they fail to generalize across varying physical conditions, such as changes in object mass and table friction. To address these challenges, we propose the Dynamics-Adaptive World Action Model (DyWA), a novel framework that enhances action learning by jointly predicting future states while adapting to dynamics variations based on historical trajectories. By unifying the modeling of geometry, state, physics, and robot actions, DyWA enables more robust policy learning under partial observability. Compared to baselines, our method improves the success rate by 31.5% using only single-view point cloud observations in the simulation. Furthermore, DyWA achieves an average success rate of 68% in real-world experiments, demonstrating its ability to generalize across diverse object geometries, adapt to varying table friction, and robustness in challenging scenarios such as half-filled water bottles and slippery surfaces.

arxiv情報

著者	Jiangran Lyu,Ziming Li,Xuesong Shi,Chaoyi Xu,Yizhou Wang,He Wang
発行日	2025-03-21 02:29:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.RO | コメントを受け付けていません

SGFormer: Satellite-Ground Fusion for 3D Semantic Scene Completion

投稿日: 2025年3月24日作成者: jarxiv

要約

最近、カメラベースのソリューションがシーンセマンティック完了（SSC）について広範囲に調査されています。
目に見える領域での成功にもかかわらず、既存の方法は、頻繁な視覚閉塞のために完全なシーンセマンティクスをキャプチャするのに苦労しています。
この制限に対処するために、このペーパーでは、最初の衛星地面協同組合SSCフレームワーク、つまりSGFormerを紹介し、SSCタスクの衛星地面画像ペアの可能性を調査します。
具体的には、直交衛星と地上ビューを並行して統合するデュアルブランチアーキテクチャを提案し、それらを共通のドメインに統合します。
さらに、機能エンコーディング中に衛星画像バイアスを修正するグラウンドビューガイダンス戦略を設計し、衛星ビューとグラウンドビューの間の不整合に対処します。
さらに、衛星と地上ビューからの貢献のバランスをとる適応重み付け戦略を開発します。
実験は、SGFORMERがSemantickittiおよびSSCBench-Kitti-360データセットの最新技術を上回ることを示しています。
私たちのコードは、https：//github.com/gxytcrc/sgformerで入手できます。

要約(オリジナル)

Recently, camera-based solutions have been extensively explored for scene semantic completion (SSC). Despite their success in visible areas, existing methods struggle to capture complete scene semantics due to frequent visual occlusions. To address this limitation, this paper presents the first satellite-ground cooperative SSC framework, i.e., SGFormer, exploring the potential of satellite-ground image pairs in the SSC task. Specifically, we propose a dual-branch architecture that encodes orthogonal satellite and ground views in parallel, unifying them into a common domain. Additionally, we design a ground-view guidance strategy that corrects satellite image biases during feature encoding, addressing misalignment between satellite and ground views. Moreover, we develop an adaptive weighting strategy that balances contributions from satellite and ground views. Experiments demonstrate that SGFormer outperforms the state of the art on SemanticKITTI and SSCBench-KITTI-360 datasets. Our code is available on https://github.com/gxytcrc/SGFormer.

arxiv情報

著者	Xiyue Guo,Jiarui Hu,Junjie Hu,Hujun Bao,Guofeng Zhang
発行日	2025-03-21 03:37:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

An Integrated Approach to Robotic Object Grasping and Manipulation

投稿日: 2025年3月24日作成者: jarxiv

要約

倉庫運用の肉体労働と効率性の増大する課題に対応して、Amazonは、さまざまなタスクを支援するためにロボット工学を組み込むことにより、大きな変革に着手しました。
倉庫内のアイテム輸送などのタスクのために、かなりの数のロボットが正常に展開されていますが、棚からのオブジェクトピッキングの複雑なプロセスは依然として大きな課題です。
このプロジェクトは、棚から特定のアイテムを効率的に選択することにより、シミュレートされた注文を自律的に満たすことができる革新的なロボットシステムを開発することにより、問題に対処します。
提案されたロボットシステムの際立った特徴は、棚の各ビン内の不確実なオブジェクト位置の課題をナビゲートする能力です。
このシステムは、そのアプローチを自律的に適応させるように設計されており、プレースメントに関する事前に確立された知識がない場合でも、目的のアイテムを効率的に見つけて取得できるようにする戦略を採用しています。

要約(オリジナル)

In response to the growing challenges of manual labor and efficiency in warehouse operations, Amazon has embarked on a significant transformation by incorporating robotics to assist with various tasks. While a substantial number of robots have been successfully deployed for tasks such as item transportation within warehouses, the complex process of object picking from shelves remains a significant challenge. This project addresses the issue by developing an innovative robotic system capable of autonomously fulfilling a simulated order by efficiently selecting specific items from shelves. A distinguishing feature of the proposed robotic system is its capacity to navigate the challenge of uncertain object positions within each bin of the shelf. The system is engineered to autonomously adapt its approach, employing strategies that enable it to efficiently locate and retrieve the desired items, even in the absence of pre-established knowledge about their placements.

arxiv情報

著者	Owais Ahmed,M Huzaifa,M Areeb,Hamza Ali Khan
発行日	2025-03-21 04:00:22+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Safe On-Orbit Dislodging of Deployable Structures via Robust Adaptive MPC

投稿日: 2025年3月24日作成者: jarxiv

要約

このホワイトペーパーでは、軌道上の除去のための新しい堅牢な適応モデル予測コントローラーを提案します。
ロボットアームを装備したサービサーが、宇宙ステーションにハイブリッドヒンジシステムを備えた力の低い詰まったソーラーパネルで構成される時変システムであるクライアントを追い払わなければならないシナリオを検討します。
私たちのアプローチは、オンラインのセットメンバーシップの識別を活用して、パラメーター空間での探査と搾取のバランスをとりながら、境界のある乱れにもかかわらず、除外中に堅牢な安全保証を提供する不確実性を減らします。
開発された堅牢な適応MPCメソッドの実現可能性は、それぞれゼロ重力および重力環境でのシミュレーションとハードウェア実験を外すことで調べられます。
さらに、この方法の利点は、パラメーター推定と制御パフォーマンスの精度の両方のためのいくつかの最先端の制御スキームを使用した比較実験を通じて示されます。

要約(オリジナル)

This paper proposes a novel robust adaptive model predictive controller for on-orbit dislodging. We consider the scenario where a servicer, equipped with a robot arm, must dislodge a client, a time-varying system composed of an underpowered jammed solar panel with a hybrid hinge system on a space station. Our approach leverages online set-membership identification to reduce the uncertainty to provide robust safety guarantees during dislodging despite bounded disturbances while balancing exploration and exploitation effectively in the parameter space. The feasibility of the developed robust adaptive MPC method is also examined through dislodging simulations and hardware experiments in zero-gravity and gravity environments, respectively. In addition, the advantages of our method are shown through comparison experiments with several state-of-the-art control schemes for both accuracy of parameter estimation and control performance.

arxiv情報

著者	Longsen Gao,Claus Danielson,Andrew Kwas,Rafael Fierro
発行日	2025-03-21 04:40:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.RO, cs.SY, eess.SY | コメントを受け付けていません

AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots

投稿日: 2025年3月24日作成者: jarxiv

要約

このペーパーでは、ユーザーのリマインダーと効果的に整合することにより、家庭用ロボットのVLM駆動のカスタマイズされたタスク計画を最適化するように設計された新しいフレームワークであるAlignbotを紹介します。
国内の設定では、タスク計画をユーザーリマインダーと調整することは、リマインダーの量、多様性、マルチモーダルの性質が限られているため、大きな課題をもたらします。
これらの課題に対処するために、AlignbotはGPT-4Oのアダプターとして機能する微調整されたLLAVA-7Bモデルを採用しています。
このアダプターモデルは、カスタマイズされたタスク計画を生成する際にGPT-4oを促す構造化された命令形成されたキューにパーソナライズされた好み、是正ガイダンス、およびコンテキスト支援 – コンテキスト支援など、さまざまなフォームのユーザーリマインダーを内部化します。
さらに、Alignbotは、GPT-4oのプロンプトとしてタスク関連の歴史的成功を選択する動的検索メカニズムを統合し、タスク計画の精度をさらに高めます。
Alignbotの有効性を検証するために、実験は実世界の家庭環境で行われ、実験室内で典型的な家庭用環境を複製します。
ボランティアリマインダーから派生した1,500を超えるエントリを備えたマルチモーダルデータセットがトレーニングと評価に使用されます。
結果は、Alignbotがカスタマイズされたタスク計画を大幅に改善し、ユーザーリマインダーと解釈および整合することにより、既存のLLMおよびVLMを搭載したプランナーを上回り、バニラGPT-4Oベースラインと比較して86.8％の成功率を達成することを示しています。
補足資料は、https：//yding25.com/alignbot/で入手できます。

要約(オリジナル)

This paper presents AlignBot, a novel framework designed to optimize VLM-powered customized task planning for household robots by effectively aligning with user reminders. In domestic settings, aligning task planning with user reminders poses significant challenges due to the limited quantity, diversity, and multimodal nature of the reminders. To address these challenges, AlignBot employs a fine-tuned LLaVA-7B model, functioning as an adapter for GPT-4o. This adapter model internalizes diverse forms of user reminders-such as personalized preferences, corrective guidance, and contextual assistance-into structured instruction-formatted cues that prompt GPT-4o in generating customized task plans. Additionally, AlignBot integrates a dynamic retrieval mechanism that selects task-relevant historical successes as prompts for GPT-4o, further enhancing task planning accuracy. To validate the effectiveness of AlignBot, experiments are conducted in real-world household environments, which are constructed within the laboratory to replicate typical household settings. A multimodal dataset with over 1,500 entries derived from volunteer reminders is used for training and evaluation. The results demonstrate that AlignBot significantly improves customized task planning, outperforming existing LLM- and VLM-powered planners by interpreting and aligning with user reminders, achieving 86.8% success rate compared to the vanilla GPT-4o baseline at 21.6%, reflecting a 65% improvement and over four times greater effectiveness. Supplementary materials are available at: https://yding25.com/AlignBot/

arxiv情報

著者	Zhaxizhuoma Zhaxizhuoma,Pengan Chen,Ziniu Wu,Jiawei Sun,Dong Wang,Peng Zhou,Nieqing Cao,Yan Ding,Bin Zhao,Xuelong Li
発行日	2025-03-21 04:40:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.IR, cs.RO | コメントを受け付けていません

HOTFormerLoc: Hierarchical Octree Transformer for Versatile Lidar Place Recognition Across Ground and Aerial Views

投稿日: 2025年3月24日作成者: jarxiv

要約

都市部と森林環境全体の地上から地面から地上から天の両方のシナリオで、大規模な3D場所認識のために、斬新で汎用性の高い階層的なオクトリーベースの変圧器であるHotformerlocを提示します。
粒度全体で空間的および意味的な特徴をキャプチャするオクトリーベースのマルチスケール注意メカニズムを提案します。
スピニングLIDARからのポイント分布の可変密度に対処するために、円筒形のオクトリー注意ウィンドウを提示して、注意の根元にある分布を反映します。
リレートークンを導入して、効率的なグローバルローカルインタラクションと計算コストを削減してマルチスケール表現学習を可能にします。
ピラミッドの注意プーリングは、挑戦的な環境でエンドツーエンドの場所認識のための堅牢なグローバルな記述子を合成します。
さらに、密な森林で撮影された航空および地上のライダースキャンからのポイントクラウドデータを特徴とする新しい3DクロスソースデータセットであるCS-Wild-Placesを紹介します。
CS-Wild-Placesのポイントクラウドには、さまざまな点密度やノイズパターンなどの表現的なギャップと特徴的な属性が含まれているため、野生でのクロスビューローカリゼーションのための挑戦的なベンチマークとなっています。
HotFormerLocは、CS-Wild-Placesベンチマークで5.5％-11.5％の上位1平均リコール改善を達成します。
さらに、SOTA 3D場所認識方法よりも一貫してアウトパフォームし、確立された都市および森林データセットで平均パフォーマンス増加が4.9％です。
コードとCS-Wild-Placesベンチマークは、https：//csiro-robotics.github.io/hotformerlocで入手できます。

要約(オリジナル)

We present HOTFormerLoc, a novel and versatile Hierarchical Octree-based TransFormer, for large-scale 3D place recognition in both ground-to-ground and ground-to-aerial scenarios across urban and forest environments. We propose an octree-based multi-scale attention mechanism that captures spatial and semantic features across granularities. To address the variable density of point distributions from spinning lidar, we present cylindrical octree attention windows to reflect the underlying distribution during attention. We introduce relay tokens to enable efficient global-local interactions and multi-scale representation learning at reduced computational cost. Our pyramid attentional pooling then synthesises a robust global descriptor for end-to-end place recognition in challenging environments. In addition, we introduce CS-Wild-Places, a novel 3D cross-source dataset featuring point cloud data from aerial and ground lidar scans captured in dense forests. Point clouds in CS-Wild-Places contain representational gaps and distinctive attributes such as varying point densities and noise patterns, making it a challenging benchmark for cross-view localisation in the wild. HOTFormerLoc achieves a top-1 average recall improvement of 5.5% – 11.5% on the CS-Wild-Places benchmark. Furthermore, it consistently outperforms SOTA 3D place recognition methods, with an average performance gain of 4.9% on well-established urban and forest datasets. The code and CS-Wild-Places benchmark is available at https://csiro-robotics.github.io/HOTFormerLoc.

arxiv情報

著者	Ethan Griffiths,Maryam Haghighat,Simon Denman,Clinton Fookes,Milad Ramezani
発行日	2025-03-21 07:00:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Incremental Learning for Robot Shared Autonomy

投稿日: 2025年3月24日作成者: jarxiv

要約

共有された自律性は、支援ロボットアームの使いやすさとアクセシビリティを改善することを約束しますが、現在の方法は、多くの場合、高価な専門家のデモンストレーションに依存しており、事前トレーニング後も静的であり続け、現実世界のバリエーションを処理する能力を制限します。
広範なトレーニングデータがあっても、予期せぬ課題、特に予期しない障害や空間的制約などのタスクのダイナミクスを根本的に変更する課題は、支援ポリシーを破壊し、無効または信頼できない支援につながります。
これに対処するために、ILSAを提案します。ILSAは、事前に収集されたデータの範囲を超えて実際の課題に適応するために、ユーザーのやり取りを通じて支援ポリシーを継続的に改良する段階的に学習された共有された自律フレームワークです。
ILSAの中核は、事前知識を維持しながら限られた新しい相互作用データを効果的に統合し、適応と一般化のバランスを確保することにより、各相互作用の継続的な改善を可能にする構造化された微調整メカニズムです。
20人の参加者を抱えるユーザー調査では、ILSAの有効性が示されており、タスクの完了が速くなり、静的な代替品と比較してユーザーエクスペリエンスが向上しています。
コードとビデオはhttps://ilsa-robo.github.io/で入手できます。

要約(オリジナル)

Shared autonomy holds promise for improving the usability and accessibility of assistive robotic arms, but current methods often rely on costly expert demonstrations and remain static after pretraining, limiting their ability to handle real-world variations. Even with extensive training data, unforeseen challenges–especially those that fundamentally alter task dynamics, such as unexpected obstacles or spatial constraints–can cause assistive policies to break down, leading to ineffective or unreliable assistance. To address this, we propose ILSA, an Incrementally Learned Shared Autonomy framework that continuously refines its assistive policy through user interactions, adapting to real-world challenges beyond the scope of pre-collected data. At the core of ILSA is a structured fine-tuning mechanism that enables continual improvement with each interaction by effectively integrating limited new interaction data while preserving prior knowledge, ensuring a balance between adaptation and generalization. A user study with 20 participants demonstrates ILSA’s effectiveness, showing faster task completion and improved user experience compared to static alternatives. Code and videos are available at https://ilsa-robo.github.io/.

arxiv情報

著者	Yiran Tao,Guixiu Qiao,Dan Ding,Zackory Erickson
発行日	2025-03-21 07:05:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.RO | コメントを受け付けていません

Deep Learning for Human Locomotion Analysis in Lower-Limb Exoskeletons: A Comparative Study

投稿日: 2025年3月24日作成者: jarxiv

要約

下肢支援のためのウェアラブルロボット工学は、身体障害のある個人のモビリティを高めるか、健常者のパフォーマンスを増強することを目指して、極めて重要な研究分野になりました。
特に多様で動的な地形をナビゲートする場合、着用者とロボットデバイスの間のシームレスな相互作用を確保するには、正確で適応的な制御システムが不可欠です。
時系列分析のためのニューラルネットワークの最近の進歩にもかかわらず、地面条件の分類に向けられた試みはありませんでした。これは5つのクラスに分類され、その後ランプの斜面と階段の高さを決定しました。
この点で、このペーパーでは、8つの深いニューラルネットワークバックボーン間の実験的な比較を示し、多様な地形全体の高レベルの移動パラメーターを予測します。
すべてのモデルは、公開されているCamargo 2021データセットでトレーニングされています。
IMUのみのデータは、IMU+EMG入力を等しくまたは上回り、費用対効果の高い効率的な設計を促進しました。
実際、3つのIMUセンサーを使用して、LSTMは高い地形分類精度（0.94 +-0.04）と正確なランプスロープ（1.95 +-0.58 {\ deg}）とCNN-LSTMが階段の高さ（15.65 +-7.40 mm）の推定を達成しました。
さらなる貢献として、SHAP分析により、パフォーマンスが低下することなくセンサーの削減が正当化され、軽量のセットアップが確保されました。
システムは、リアルタイムアプリケーションをサポートして、約2ミリ秒の推論時間で動作します。
このコードは、https://github.com/cosbidev/human-locomotion-識別で入手可能なコードです。

要約(オリジナル)

Wearable robotics for lower-limb assistance have become a pivotal area of research, aiming to enhance mobility for individuals with physical impairments or augment the performance of able-bodied users. Accurate and adaptive control systems are essential to ensure seamless interaction between the wearer and the robotic device, particularly when navigating diverse and dynamic terrains. Despite the recent advances in neural networks for time series analysis, no attempts have been directed towards the classification of ground conditions, categorized into five classes and subsequently determining the ramp’s slope and stair’s height. In this respect, this paper presents an experimental comparison between eight deep neural network backbones to predict high-level locomotion parameters across diverse terrains. All the models are trained on the publicly available CAMARGO 2021 dataset. IMU-only data equally or outperformed IMU+EMG inputs, promoting a cost-effective and efficient design. Indeeds, using three IMU sensors, the LSTM achieved high terrain classification accuracy (0.94 +- 0.04) and precise ramp slope (1.95 +- 0.58{\deg}) and the CNN-LSTM a stair height (15.65 +- 7.40 mm) estimations. As a further contribution, SHAP analysis justified sensor reduction without performance loss, ensuring a lightweight setup. The system operates with ~2 ms inference time, supporting real-time applications. The code is code available at https://github.com/cosbidev/Human-Locomotion-Identification.

arxiv情報

著者	Omar Coser,Christian Tamantini,Matteo Tortora,Leonardo Furia,Rosa Sicilia,Loredana Zollo,Paolo Soda
発行日	2025-03-21 07:12:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.RO, F.2.2, I.2.7 | コメントを受け付けていません

GAPartManip: A Large-scale Part-centric Dataset for Material-Agnostic Articulated Object Manipulation

投稿日: 2025年3月24日作成者: jarxiv

要約

家庭用シナリオで明確なオブジェクトを効果的に操作することは、一般的な具体化された人工知能を達成するための重要なステップです。
3Dビジョンにおける主流の研究は、主に深さの知覚とポーズ検出による操作に焦点を当てています。
ただし、実際の環境では、これらの方法は、透明な蓋や反射ハンドルなど、不完全な深さの知覚のために課題に直面することがよくあります。
さらに、それらは一般に、柔軟で順応性のある操作に必要な部分ベースの相互作用に多様性を欠いています。
これらの課題に対処するために、写真リアルな材料のランダム化と、シーンレベルの実用的な相互作用ポーズの部分指向の詳細な注釈の両方を特徴とする明確なオブジェクト操作のための大規模な部分中心のデータセットを導入しました。
データセットの有効性は、深度推定と相互作用のポーズ予測のためのいくつかの最先端の方法と統合することにより、データセットの有効性を評価しました。
さらに、一般化可能な明確なオブジェクト操作に優れた堅牢なパフォーマンスを提供する新しいモジュラーフレームワークを提案しました。
当社の広範な実験は、データセットが深さ知覚のパフォーマンスを大幅に改善し、実用的な相互作用がシミュレーションと実世界のシナリオの両方で予測をもたらすことを示しています。
詳細とデモは、https：//pku-epic.github.io/gapartmanip/をご覧ください。

要約(オリジナル)

Effectively manipulating articulated objects in household scenarios is a crucial step toward achieving general embodied artificial intelligence. Mainstream research in 3D vision has primarily focused on manipulation through depth perception and pose detection. However, in real-world environments, these methods often face challenges due to imperfect depth perception, such as with transparent lids and reflective handles. Moreover, they generally lack the diversity in part-based interactions required for flexible and adaptable manipulation. To address these challenges, we introduced a large-scale part-centric dataset for articulated object manipulation that features both photo-realistic material randomization and detailed annotations of part-oriented, scene-level actionable interaction poses. We evaluated the effectiveness of our dataset by integrating it with several state-of-the-art methods for depth estimation and interaction pose prediction. Additionally, we proposed a novel modular framework that delivers superior and robust performance for generalizable articulated object manipulation. Our extensive experiments demonstrate that our dataset significantly improves the performance of depth perception and actionable interaction pose prediction in both simulation and real-world scenarios. More information and demos can be found at: https://pku-epic.github.io/GAPartManip/.

arxiv情報

著者	Wenbo Cui,Chengyang Zhao,Songlin Wei,Jiazhao Zhang,Haoran Geng,Yaran Chen,Haoran Li,He Wang
発行日	2025-03-21 07:52:16+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.RO | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント