TURTLMap: Real-time Localization and Dense Mapping of Low-texture Underwater Environments with a Low-cost Unmanned Underwater Vehicle


この論文では、リアルタイムの位置特定およびマッピング手法を通じてテクスチャレスの水中環境に焦点を当てた新しいソリューションである TURTLMap を紹介します。
モーション キャプチャ システムとグランド トゥルース参照マップを備えた屋内水槽で収集された実世界データを使用して、提案された手法を評価します。
TURTLMap のプロジェクト ページは https://umfieldrobotics.github.io/TURTLMap です。


Significant work has been done on advancing localization and mapping in underwater environments. Still, state-of-the-art methods are challenged by low-texture environments, which is common for underwater settings. This makes it difficult to use existing methods in diverse, real-world scenes. In this paper, we present TURTLMap, a novel solution that focuses on textureless underwater environments through a real-time localization and mapping method. We show that this method is low-cost, and capable of tracking the robot accurately, while constructing a dense map of a low-textured environment in real-time. We evaluate the proposed method using real-world data collected in an indoor water tank with a motion capture system and ground truth reference map. Qualitative and quantitative results validate the proposed system achieves accurate and robust localization and precise dense mapping, even when subject to wave conditions. The project page for TURTLMap is https://umfieldrobotics.github.io/TURTLMap.


著者 Jingyu Song,Onur Bagoren,Razan Andigani,Advaith Venkatramanan Sethuraman,Katherine A. Skinner
発行日 2024-10-09 17:12:15+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.RO | コメントする

FlowBotHD: History-Aware Diffuser Handling Ambiguities in Articulated Objects Manipulation


私たちのプロジェクトの Web サイトは https://flowbothd.github.io/ から入手できます。


We introduce a novel approach to manipulate articulated objects with ambiguities, such as opening a door, in which multi-modality and occlusions create ambiguities about the opening side and direction. Multi-modality occurs when the method to open a fully closed door (push, pull, slide) is uncertain, or the side from which it should be opened is uncertain. Occlusions further obscure the door’s shape from certain angles, creating further ambiguities during the occlusion. To tackle these challenges, we propose a history-aware diffusion network that models the multi-modal distribution of the articulated object and uses history to disambiguate actions and make stable predictions under occlusions. Experiments and analysis demonstrate the state-of-art performance of our method and specifically improvements in ambiguity-caused failure modes. Our project website is available at https://flowbothd.github.io/.


著者 Yishu Li,Wen Hui Leng,Yiming Fang,Ben Eisner,David Held
発行日 2024-10-09 17:23:04+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.RO | コメントする

VIRT: Vision Instructed Transformer for Robotic Manipulation


これらのイノベーションを活用して、完全に Transformer ベースのポリシーである VIRT を開発します。
私たちは、物理的なロボットとシミュレートされた環境の両方を使用して包括的なタスクを設計し、VIRT の有効性を評価します。
その結果、VIRT は「密閉されたボトルの蓋を開ける」などの非常に競争力の高いタスクを完了できることが示され、提案された手法により、さまざまな困難なタスクにおけるベースライン ポリシーの成功率がほぼ 0% から 65% 以上に上昇しました。


Robotic manipulation, owing to its multi-modal nature, often faces significant training ambiguity, necessitating explicit instructions to clearly delineate the manipulation details in tasks. In this work, we highlight that vision instruction is naturally more comprehensible to recent robotic policies than the commonly adopted text instruction, as these policies are born with some vision understanding ability like human infants. Building on this premise and drawing inspiration from cognitive science, we introduce the robotic imagery paradigm, which realizes large-scale robotic data pre-training without text annotations. Additionally, we propose the robotic gaze strategy that emulates the human eye gaze mechanism, thereby guiding subsequent actions and focusing the attention of the policy on the manipulated object. Leveraging these innovations, we develop VIRT, a fully Transformer-based policy. We design comprehensive tasks using both a physical robot and simulated environments to assess the efficacy of VIRT. The results indicate that VIRT can complete very competitive tasks like “opening the lid of a tightly sealed bottle”, and the proposed techniques boost the success rates of the baseline policy on diverse challenging tasks from nearly 0% to more than 65%.


著者 Zhuoling Li,Liangliang Ren,Jinrong Yang,Yong Zhao,Xiaoyang Wu,Zhenhua Xu,Xiang Bai,Hengshuang Zhao
発行日 2024-10-09 17:59:06+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.RO | コメントする

Diffusion Density Estimators


この問題に対する現在のアプローチには、確率フロー ODE として知られる、生成プロセスを滑らかなフローに変換することが含まれます。
特定のサンプルでの対数密度は、ブラック ボックス ソルバーを使用して ODE を解くことによって取得できます。
また、さまざまなトレーニング パラメーターが密度計算の精度にどのような影響を与えるかを研究し、これらのモデルをよりスケーラブルかつ効率的にする方法についての洞察を提供します。


We investigate the use of diffusion models as neural density estimators. The current approach to this problem involves converting the generative process to a smooth flow, known as the Probability Flow ODE. The log density at a given sample can be obtained by solving the ODE with a black-box solver. We introduce a new, highly parallelizable method that computes log densities without the need to solve a flow. Our approach is based on estimating a path integral by Monte Carlo, in a manner identical to the simulation-free training of diffusion models. We also study how different training parameters affect the accuracy of the density calculation, and offer insights into how these models can be made more scalable and efficient.


著者 Akhil Premkumar
発行日 2024-10-09 15:21:53+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, stat.ML | コメントする

Symbolic Recovery of Differential Equations: The Identifiability Problem




Symbolic recovery of differential equations is the ambitious attempt at automating the derivation of governing equations with the use of machine learning techniques. In contrast to classical methods which assume the structure of the equation to be known and focus on the estimation of specific parameters, these algorithms aim to learn the structure and the parameters simultaneously. While the uniqueness and, therefore, the identifiability of parameters of governing equations are a well-addressed problem in the field of parameter estimation, it has not been investigated for symbolic recovery. However, this problem should be even more present in this field since the algorithms aim to cover larger spaces of governing equations. In this paper, we investigate under which conditions a solution of a differential equation does not uniquely determine the equation itself. For various classes of differential equations, we provide both necessary and sufficient conditions for a function to uniquely determine the corresponding differential equation. We then use our results to devise numerical algorithms aiming to determine whether a function solves a differential equation uniquely. Finally, we provide extensive numerical experiments showing that our algorithms can indeed guarantee the uniqueness of the learned governing differential equation, without assuming any knowledge about the analytic form of function, thereby ensuring the reliability of the learned equation.


著者 Philipp Scholl,Aras Bacho,Holger Boche,Gitta Kutyniok
発行日 2024-10-09 15:27:08+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, math-ph, math.MP | コメントする

Efficient Distribution Matching of Representations via Noise-Injected Deep InfoMax


Deep InfoMax (DIM) は、ディープ ニューラル ネットワーク エンコーダーの入力と出力間の相互情報の最大化に基づく自己教師あり表現学習 (SSRL) の確立された手法です。
DIM と対照的 SSRL は一般に十分に研究されているにもかかわらず、特定の分布に準拠する表現を学習するタスク (つまり、分布マッチング、DM) についてはまだ十分に取り組まれていません。
いくつかの下流タスク (生成モデリング、もつれ解除、外れ値検出など) に対する DM の重要性を動機として、学習した表現と選択した事前分布の自動マッチングを可能にするために DIM を強化しました。
これを達成するために、同じ InfoMax トレーニング目標を維持しながら、エンコーダの正規化された出力に独立したノイズを注入することを提案します。
この結果は、下流タスクのパフォーマンスと DM の品質との間に適度なトレードオフがあることを示しています。


Deep InfoMax (DIM) is a well-established method for self-supervised representation learning (SSRL) based on maximization of the mutual information between the input and the output of a deep neural network encoder. Despite the DIM and contrastive SSRL in general being well-explored, the task of learning representations conforming to a specific distribution (i.e., distribution matching, DM) is still under-addressed. Motivated by the importance of DM to several downstream tasks (including generative modeling, disentanglement, outliers detection and other), we enhance DIM to enable automatic matching of learned representations to a selected prior distribution. To achieve this, we propose injecting an independent noise into the normalized outputs of the encoder, while keeping the same InfoMax training objective. We show that such modification allows for learning uniformly and normally distributed representations, as well as representations of other absolutely continuous distributions. Our approach is tested on various downstream tasks. The results indicate a moderate trade-off between the performance on the downstream tasks and quality of DM.


著者 Ivan Butakov,Alexander Sememenko,Alexander Tolmachev,Andrey Gladkov,Marina Munkhoeva,Alexey Frolov
発行日 2024-10-09 15:40:04+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: (Primary), 94A17, cs.IT, cs.LG, H.1.1, math.IT, stat.ML | コメントする

Through the Looking Glass: Mirror Schrödinger Bridges


密度が不明なターゲット メジャーからのリサンプリングは、数学的統計と機械学習における基本的な問題です。
この論文では、ミラー シュレディンガー ブリッジと呼ばれる条件付きリサンプリングの新しいモデルを提案します。
私たちの重要な観察は、分布とそれ自体の間のシュレディンガー ブリッジ問題を解決すると、条件付き分布から新しいサンプルを生成し、入力データ ポイントの分布内変動を与える自然な方法が提供されるということです。
このほとんど見落とされてきたバージョンの Schr\’odinger ブリッジ問題を効率的に解決する方法を示します。


Resampling from a target measure whose density is unknown is a fundamental problem in mathematical statistics and machine learning. A setting that dominates the machine learning literature consists of learning a map from an easy-to-sample prior, such as the Gaussian distribution, to a target measure. Under this model, samples from the prior are pushed forward to generate a new sample on the target measure, which is often difficult to sample from directly. In this paper, we propose a new model for conditional resampling called mirror Schr\’odinger bridges. Our key observation is that solving the Schr\’odinger bridge problem between a distribution and itself provides a natural way to produce new samples from conditional distributions, giving in-distribution variations of an input data point. We show how to efficiently solve this largely overlooked version of the Schr\’odinger bridge problem. We prove that our proposed method leads to significant algorithmic simplifications over existing alternatives, in addition to providing control over in-distribution variation. Empirically, we demonstrate how these benefits can be leveraged to produce proximal samples in a number of application domains.


著者 Leticia Mattos Da Silva,Silvia Sellán,Justin Solomon
発行日 2024-10-09 15:48:56+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG | コメントする

Causal Representation Learning in Temporal Data via Single-Parent Decoding


我々は、結果として得られるモデルの識別可能性を実証し、基礎となる潜在とそれらの因果グラフを同時に学習する微分可能な手法である単一親デコーディングによる因果発見 (CDSD) を提案します。


Scientific research often seeks to understand the causal structure underlying high-level variables in a system. For example, climate scientists study how phenomena, such as El Ni\~no, affect other climate processes at remote locations across the globe. However, scientists typically collect low-level measurements, such as geographically distributed temperature readings. From these, one needs to learn both a mapping to causally-relevant latent variables, such as a high-level representation of the El Ni\~no phenomenon and other processes, as well as the causal model over them. The challenge is that this task, called causal representation learning, is highly underdetermined from observational data alone, requiring other constraints during learning to resolve the indeterminacies. In this work, we consider a temporal model with a sparsity assumption, namely single-parent decoding: each observed low-level variable is only affected by a single latent variable. Such an assumption is reasonable in many scientific applications that require finding groups of low-level variables, such as extracting regions from geographically gridded measurement data in climate research or capturing brain regions from neural activity data. We demonstrate the identifiability of the resulting model and propose a differentiable method, Causal Discovery with Single-parent Decoding (CDSD), that simultaneously learns the underlying latents and a causal graph over them. We assess the validity of our theoretical results using simulated data and showcase the practical validity of our method in an application to real-world data from the climate science field.


著者 Philippe Brouillard,Sébastien Lachapelle,Julia Kaltenborn,Yaniv Gurwicz,Dhanya Sridhar,Alexandre Drouin,Peer Nowack,Jakob Runge,David Rolnick
発行日 2024-10-09 15:57:50+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG | コメントする

Optimizing Estimators of Squared Calibration Errors in Classification


現在の文献にはさまざまなキャリブレーション (誤差) 推定量が存在しますが、適切な推定量の選択とそのハイパーパラメータの調整に関するガイダンスが不足しています。
二乗校正誤差の双一次構造を利用することで、独立した同一分布 (i.i.d.) の入力ペアを使用した回帰問題として校正推定を再定式化します。
私たちのアプローチでは、評価データセットのキャリブレーション エラーを推定する際に、トレーニング、検証、テストのパイプラインを推奨します。
既存のキャリブレーション推定量を最適化し、標準的な画像分類タスクに関する新しいカーネル リッジ回帰ベースの推定量と比較することで、パイプラインの有効性を実証します。


In this work, we propose a mean-squared error-based risk that enables the comparison and optimization of estimators of squared calibration errors in practical settings. Improving the calibration of classifiers is crucial for enhancing the trustworthiness and interpretability of machine learning models, especially in sensitive decision-making scenarios. Although various calibration (error) estimators exist in the current literature, there is a lack of guidance on selecting the appropriate estimator and tuning its hyperparameters. By leveraging the bilinear structure of squared calibration errors, we reformulate calibration estimation as a regression problem with independent and identically distributed (i.i.d.) input pairs. This reformulation allows us to quantify the performance of different estimators even for the most challenging calibration criterion, known as canonical calibration. Our approach advocates for a training-validation-testing pipeline when estimating a calibration error on an evaluation dataset. We demonstrate the effectiveness of our pipeline by optimizing existing calibration estimators and comparing them with novel kernel ridge regression-based estimators on standard image classification tasks.


著者 Sebastian G. Gruber,Francis Bach
発行日 2024-10-09 15:58:06+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, stat.ML | コメントする

The Vital Role of Gradient Clipping in Byzantine-Resilient Distributed Learning


最先端 (SOTA) のロバスト分散勾配降下法 (Robust-DGD) 法は理論的には最適であることが証明されていますが、その経験的な成功は、多くの場合、事前集約勾配クリッピングに依存しています。
私たちは、Adaptive Robust Clipping (ARC) と呼ばれる、原則に基づいた適応クリッピング戦略を提案することで、このギャップに対処します。
我々は、ARC が理論的な堅牢性の保証を維持しながら、SOTA Robust-DGD 手法の経験的な堅牢性を一貫して強化することを示します。
私たちの分析は、モデルが適切に初期化されている場合、ARC が Robust-DGD の漸近収束保証を明らかに改善することを示しています。
ARC によって引き起こされる改善は、非常に異質な環境や敵対的な環境でより顕著であることが観察されています。


Byzantine-resilient distributed machine learning seeks to achieve robust learning performance in the presence of misbehaving or adversarial workers. While state-of-the-art (SOTA) robust distributed gradient descent (Robust-DGD) methods were proven theoretically optimal, their empirical success has often relied on pre-aggregation gradient clipping. However, the currently considered static clipping strategy exhibits mixed results: improving robustness against some attacks while being ineffective or detrimental against others. We address this gap by proposing a principled adaptive clipping strategy, termed Adaptive Robust Clipping (ARC). We show that ARC consistently enhances the empirical robustness of SOTA Robust-DGD methods, while preserving the theoretical robustness guarantees. Our analysis shows that ARC provably improves the asymptotic convergence guarantee of Robust-DGD in the case when the model is well-initialized. We validate this theoretical insight through an exhaustive set of experiments on benchmark image classification tasks. We observe that the improvement induced by ARC is more pronounced in highly heterogeneous and adversarial settings.


著者 Youssef Allouah,Rachid Guerraoui,Nirupam Gupta,Ahmed Jellouli,Geovani Rizk,John Stephan
発行日 2024-10-09 16:04:01+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG | コメントする