ViewActive: Active viewpoint optimization from a single image


人間は、物体を観察する際に、空間的な視覚化と、現在の観察に基づいて最適な視点の候補を思い浮かべる精神的な回転能力の恩恵を受けている。なぜなら、最適な視点は、2次元画像で情景を正確に表現するために不可欠で有益な特徴を提供し、それによって下流のタスクを向上させるからである。 このような人間のような能動的な視点最適化能力をロボットに与えるために、我々は、アスペクトグラフから着想を得た現代的な機械学習アプローチであるViewActiveを提案し、現在の2D画像入力に基づいてのみ視点最適化ガイダンスを提供する。具体的には、3次元視点品質フィールド(VQF)を導入する。VQFは、アスペクトグラフに似た、コンパクトで一貫性のある視点品質分布表現であり、3つの汎用視点品質メトリクス(自己包含率、占有を考慮した表面法線エントロピー、視覚エントロピー)から構成される。軽量なViewActiveネットワーク(シングルGPUで72FPS)は、最先端の物体認識パイプラインの性能を大幅に向上させ、ロボットアプリケーションのリアルタイム動作計画に統合することができます。私たちのコードとデータセットはこちらから入手可能です:


When observing objects, humans benefit from their spatial visualization and mental rotation ability to envision potential optimal viewpoints based on the current observation. This capability is crucial for enabling robots to achieve efficient and robust scene perception during operation, as optimal viewpoints provide essential and informative features for accurately representing scenes in 2D images, thereby enhancing downstream tasks. To endow robots with this human-like active viewpoint optimization capability, we propose ViewActive, a modernized machine learning approach drawing inspiration from aspect graph, which provides viewpoint optimization guidance based solely on the current 2D image input. Specifically, we introduce the 3D Viewpoint Quality Field (VQF), a compact and consistent representation for viewpoint quality distribution similar to an aspect graph, composed of three general-purpose viewpoint quality metrics: self-occlusion ratio, occupancy-aware surface normal entropy, and visual entropy. We utilize pre-trained image encoders to extract robust visual and semantic features, which are then decoded into the 3D VQF, allowing our model to generalize effectively across diverse objects, including unseen categories.The lightweight ViewActive network (72 FPS on a single GPU) significantly enhances the performance of state-of-the-art object recognition pipelines and can be integrated into real-time motion planning for robotic applications. Our code and dataset are available here:


著者 Jiayi Wu,Xiaomin Lin,Botao He,Cornelia Fermuller,Yiannis Aloimonos
発行日 2024-10-03 14:43:01+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, DeepL

カテゴリー: cs.RO | コメントする

Embodied-RAG: General Non-parametric Embodied Memory for Retrieval and Generation


ロボットが探索し学習する量に制限はないが、その知識はすべて検索可能で実用的である必要がある。しかし、既存の技術は、マルチモーダルであり、データの相関性が高く、知覚に抽象化が必要な具現化領域には直接適用できない。 これらの課題に対処するために、我々は、ナビゲーションと言語生成の両方のための階層的知識を自律的に構築することができるノンパラメトリック記憶システムを備えた具現化エージェントの基礎モデルを強化するフレームワークであるEmbodied-RAGを紹介する。Embodied-RAGは、特定のオブジェクトや雰囲気の全体的な記述など、多様な環境とクエリのタイプにわたって、あらゆる空間的および意味的な解像度を処理する。その中核となるEmbodied-RANGのメモリはセマンティック・フォレスト(意味の森)として構成され、さまざまな詳細レベルの言語記述を保存します。この階層的な構成により、システムは異なるロボットプラットフォーム間で文脈に応じた出力を効率的に生成することができる。我々は、Embodied-RAGがRAGをロボット工学の領域に効果的に橋渡しすることを実証し、19の環境において200以上の説明とナビゲーションのクエリを処理することに成功し、具現化されたエージェントのための汎用的なノンパラメトリックシステムとしての可能性を強調した。


There is no limit to how much a robot might explore and learn, but all of that knowledge needs to be searchable and actionable. Within language research, retrieval augmented generation (RAG) has become the workhouse of large-scale non-parametric knowledge, however existing techniques do not directly transfer to the embodied domain, which is multimodal, data is highly correlated, and perception requires abstraction. To address these challenges, we introduce Embodied-RAG, a framework that enhances the foundational model of an embodied agent with a non-parametric memory system capable of autonomously constructing hierarchical knowledge for both navigation and language generation. Embodied-RAG handles a full range of spatial and semantic resolutions across diverse environments and query types, whether for a specific object or a holistic description of ambiance. At its core, Embodied-RAG’s memory is structured as a semantic forest, storing language descriptions at varying levels of detail. This hierarchical organization allows the system to efficiently generate context-sensitive outputs across different robotic platforms. We demonstrate that Embodied-RAG effectively bridges RAG to the robotics domain, successfully handling over 200 explanation and navigation queries across 19 environments, highlighting its promise for general-purpose non-parametric system for embodied agents.


著者 Quanting Xie,So Yeon Min,Tianyi Zhang,Aarav Bajaj,Ruslan Salakhutdinov,Matthew Johnson-Roberson,Yonatan Bisk
発行日 2024-10-03 15:17:22+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, DeepL

カテゴリー: cs.AI, cs.LG, cs.RO | コメントする

Making Space for Time: The Special Galilean Group and Its Application to Some Robotics Problems




The special Galilean group, usually denoted SGal(3), is a 10-dimensional Lie group whose important subgroups include the special orthogonal group, the special Euclidean group, and the group of extended poses. We briefly describe SGal(3) and its Lie algebra and show how the group structure supports a unified representation of uncertainty in space and time. Our aim is to highlight the potential usefulness of this group for several robotics problems.


著者 Jonathan Kelly
発行日 2024-10-03 15:29:17+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, DeepL

カテゴリー: cs.RO, math.GR | コメントする

Extremum Seeking Controlled Wiggling for Tactile Insertion


人間は、コップを食器棚に入れる、ケーブルを配線する、鍵を挿すなどの挿入作業を行う際、対象物をくねらせ、触覚や固有感覚フィードバックを通じてその過程を観察する。近年の触覚センサの進歩により、触覚に基づくアプローチが実現されているが、人間の行動に類似したくねくね動作に基づく一般化された定式化は行われていない。そこで我々は、鍵の種類に大きなばらつきがあるにもかかわらず、制御パラメータを調整することなく、4種類の鍵を4種類の鍵に挿入できる極値探索制御則を提案する。その結果、モデルフリー定式化は、鍵を把持するGelSight Mini触覚センサによって測定されるひずみを最小化しながら、挿入深さを最大化するようにエンドエフェクタのポーズをくねらせる。このアルゴリズムは、並進と姿勢の両方が不確かな状態で、ランダムに初期化した120回の試行で71%の成功率を達成した。240回の決定論的に初期化された試行(並進または回転パラメータが1つだけ摂動される)では、84%の試行が成功した。13Hzの触覚フィードバックが与えられた場合、これらの試行グループの平均挿入時間はそれぞれ262秒と147秒である。


When humans perform insertion tasks such as inserting a cup into a cupboard, routing a cable, or key insertion, they wiggle the object and observe the process through tactile and proprioceptive feedback. While recent advances in tactile sensors have resulted in tactile-based approaches, there has not been a generalized formulation based on wiggling similar to human behavior. Thus, we propose an extremum-seeking control law that can insert four keys into four types of locks without control parameter tuning despite significant variation in lock type. The resulting model-free formulation wiggles the end effector pose to maximize insertion depth while minimizing strain as measured by a GelSight Mini tactile sensor that grasps a key. The algorithm achieves a 71\% success rate over 120 randomly initialized trials with uncertainty in both translation and orientation. Over 240 deterministically initialized trials, where only one translation or rotation parameter is perturbed, 84\% of trials succeeded. Given tactile feedback at 13 Hz, the mean insertion time for these groups of trials are 262 and 147 seconds respectively.


著者 Levi Burner,Pavan Mantripragada,Gabriele M. Caddeo,Lorenzo Natale,Cornelia Fermüller,Yiannis Aloimonos
発行日 2024-10-03 15:37:11+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, DeepL

カテゴリー: cs.RO | コメントする

Trajectory Optimization with Global Yaw Parameterization for Field-of-View Constrained Autonomous Flight




Trajectory generation for quadrotors with limited field-of-view sensors has numerous applications such as aerial exploration, coverage, inspection, videography, and target tracking. Most previous works simplify the task of optimizing yaw trajectories by either aligning the heading of the robot with its velocity, or potentially restricting the feasible space of candidate trajectories by using a limited yaw domain to circumvent angular singularities. In this paper, we propose a novel \textit{global} yaw parameterization method for trajectory optimization that allows a 360-degree yaw variation as demanded by the underlying algorithm. This approach effectively bypasses inherent singularities by including supplementary quadratic constraints and transforming the final decision variables into the desired state representation. This method significantly reduces the needed control effort, and improves optimization feasibility. Furthermore, we apply the method to several examples of different applications that require jointly optimizing over both the yaw and position trajectories. Ultimately, we present a comprehensive numerical analysis and evaluation of our proposed method in both simulation and real-world experiments.


著者 Yuwei Wu,Yuezhan Tao,Igor Spasojevic,Vijay Kumar
発行日 2024-10-03 17:28:52+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, DeepL

カテゴリー: cs.RO | コメントする

GUD: Generation with Unified Diffusion




Diffusion generative models transform noise into data by inverting a process that progressively adds noise to data samples. Inspired by concepts from the renormalization group in physics, which analyzes systems across different scales, we revisit diffusion models by exploring three key design aspects: 1) the choice of representation in which the diffusion process operates (e.g. pixel-, PCA-, Fourier-, or wavelet-basis), 2) the prior distribution that data is transformed into during diffusion (e.g. Gaussian with covariance $\Sigma$), and 3) the scheduling of noise levels applied separately to different parts of the data, captured by a component-wise noise schedule. Incorporating the flexibility in these choices, we develop a unified framework for diffusion generative models with greatly enhanced design freedom. In particular, we introduce soft-conditioning models that smoothly interpolate between standard diffusion models and autoregressive models (in any basis), conceptually bridging these two approaches. Our framework opens up a wide design space which may lead to more efficient training and data generation, and paves the way to novel architectures integrating different generative approaches and generation tasks.


著者 Mathis Gerdes,Max Welling,Miranda C. N. Cheng
発行日 2024-10-03 16:51:14+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, DeepL

カテゴリー: cs.LG, hep-th, stat.ML | コメントする

DyGPrompt: Learning Feature and Time Prompts on Dynamic Graphs




Dynamic graphs capture evolving interactions between entities, such as in social networks, online learning platforms, and crowdsourcing projects. For dynamic graph modeling, dynamic graph neural networks (DGNNs) have emerged as a mainstream technique. However, they are generally pre-trained on the link prediction task, leaving a significant gap from the objectives of downstream tasks such as node classification. To bridge the gap, prompt-based learning has gained traction on graphs, but most existing efforts focus on static graphs, neglecting the evolution of dynamic graphs. In this paper, we propose DYGPROMPT, a novel pre-training and prompt learning framework for dynamic graph modeling. First, we design dual prompts to address the gap in both task objectives and temporal variations across pre-training and downstream tasks. Second, we recognize that node and time features mutually characterize each other, and propose dual condition-nets to model the evolving node-time patterns in downstream tasks. Finally, we thoroughly evaluate and analyze DYGPROMPT through extensive experiments on four public datasets.


著者 Xingtong Yu,Zhenghao Liu,Yuan Fang,Xinming Zhang
発行日 2024-10-03 16:59:18+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, DeepL

カテゴリー: cs.LG | コメントする

Signature Isolation Forest


Functional Isolation Forest (FIF)は、機能データ用に設計された最新の異常検出(AD)アルゴリズムである。FIFは、線形内積を通して描画された辞書に各曲線観察を投影することにより異常スコアを計算する、ツリー分割手順に依存する。このような線形内積と辞書は、アルゴリズムの性能に大きく影響する先験的な選択であり、特に複雑なデータセットでは信頼できない結果につながる可能性がある。本研究では、ラフパス理論のシグネチャ変換を活用した新しいADアルゴリズムクラスである「シグネチャ分離フォレスト(Signature Isolation Forest)」を導入することで、これらの課題に対処する。我々の目的は、特にFIF内積の線形性と辞書の選択をターゲットとした2つのアルゴリズムの提案を通じて、FIFによって課される制約を取り除くことである。我々の手法の妥当性を示す実世界応用ベンチマークを含むいくつかの数値実験を提供する。


Functional Isolation Forest (FIF) is a recent state-of-the-art Anomaly Detection (AD) algorithm designed for functional data. It relies on a tree partition procedure where an abnormality score is computed by projecting each curve observation on a drawn dictionary through a linear inner product. Such linear inner product and the dictionary are a priori choices that highly influence the algorithm’s performances and might lead to unreliable results, particularly with complex datasets. This work addresses these challenges by introducing \textit{Signature Isolation Forest}, a novel AD algorithm class leveraging the rough path theory’s signature transform. Our objective is to remove the constraints imposed by FIF through the proposition of two algorithms which specifically target the linearity of the FIF inner product and the choice of the dictionary. We provide several numerical experiments, including a real-world applications benchmark showing the relevance of our methods.


著者 Marta Campi,Guillaume Staerman,Gareth W. Peters,Tomoko Matsui
発行日 2024-10-03 17:05:49+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, DeepL

カテゴリー: cs.LG, stat.ML | コメントする

Fair Allocation in Dynamic Mechanism Design




We consider a dynamic mechanism design problem where an auctioneer sells an indivisible good to groups of buyers in every round, for a total of $T$ rounds. The auctioneer aims to maximize their discounted overall revenue while adhering to a fairness constraint that guarantees a minimum average allocation for each group. We begin by studying the static case ($T=1$) and establish that the optimal mechanism involves two types of subsidization: one that increases the overall probability of allocation to all buyers, and another that favors the groups which otherwise have a lower probability of winning the item. We then extend our results to the dynamic case by characterizing a set of recursive functions that determine the optimal allocation and payments in each round. Notably, our results establish that in the dynamic case, the seller, on the one hand, commits to a participation bonus to incentivize truth-telling, and on the other hand, charges an entry fee for every round. Moreover, the optimal allocation once more involves subsidization, which its extent depends on the difference in future utilities for both the seller and buyers when allocating the item to one group versus the others. Finally, we present an approximation scheme to solve the recursive equations and determine an approximately optimal and fair allocation efficiently.


著者 Alireza Fallah,Michael I. Jordan,Annie Ulichney
発行日 2024-10-03 17:05:51+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, DeepL

カテゴリー: cs.GT, cs.LG, econ.TH | コメントする

Highly Adaptive Ridge


本論文では、Highly Adaptive Ridge (HAR)を提案する。これは、平方可積分断面導関数を持つ右連続関数のクラスにおいて、$n^{-1/3}$無次元L2収束率を達成する回帰手法である。これはノンパラメトリックな大きな関数クラスで、特に表データに適している。HARはまさに、飽和0次テンソル積スプライン基底展開に基づく特定のデータ適応カーネルを持つカーネルリッジ回帰である。シミュレーションと実データを用いて我々の理論を確認する。特に小さなデータセットにおいて、最新のアルゴリズムよりも優れた性能を実証する。


In this paper we propose the Highly Adaptive Ridge (HAR): a regression method that achieves a $n^{-1/3}$ dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives. This is a large nonparametric function class that is particularly appropriate for tabular data. HAR is exactly kernel ridge regression with a specific data-adaptive kernel based on a saturated zero-order tensor-product spline basis expansion. We use simulation and real data to confirm our theory. We demonstrate empirical performance better than state-of-the-art algorithms for small datasets in particular.


著者 Alejandro Schuler,Alexander Hagemeister,Mark van der Laan
発行日 2024-10-03 17:06:06+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, DeepL

カテゴリー: cs.LG, stat.ML | コメントする