jarxiv | Japanese arxiv | ページ 152

4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos

投稿日: 2025年6月10日作成者: jarxiv

要約

ダイナミックシーンの再構築のための4Dガウスベースのトランスモデルである4DGTを提案し、実際のモノクラーポーズで完全に訓練されています。
4Dガウスを誘導バイアスとして使用すると、4DGTは静的コンポーネントと動的コンポーネントを統合し、異なるオブジェクト寿命を備えた複雑で時変環境のモデリングを可能にします。
トレーニングにおける新しい密度制御戦略を提案しました。これにより、4DGTはより長い時空の入力を処理し、実行時に効率的なレンダリングを維持できます。
私たちのモデルは、64連続したポーズフレームをローリングウィンドウの方法で処理し、シーン内の一貫した4Dガウス人を予測します。
最適化ベースの方法とは異なり、4DGTは純粋にフィードフォワード推論を実行し、再建時間を数時間から秒まで短縮し、効果的に長いビデオシーケンスにスケーリングします。
大規模なモノクラーポーズビデオデータセットでのみトレーニングされている4DGTは、以前のガウスベースのネットワークを実際のビデオで大幅に上回り、クロスドメインビデオの最適化ベースの方法で標準の精度を達成できます。
プロジェクトページ：https：//4dgt.github.io

要約(オリジナル)

We propose 4DGT, a 4D Gaussian-based Transformer model for dynamic scene reconstruction, trained entirely on real-world monocular posed videos. Using 4D Gaussian as an inductive bias, 4DGT unifies static and dynamic components, enabling the modeling of complex, time-varying environments with varying object lifespans. We proposed a novel density control strategy in training, which enables our 4DGT to handle longer space-time input and remain efficient rendering at runtime. Our model processes 64 consecutive posed frames in a rolling-window fashion, predicting consistent 4D Gaussians in the scene. Unlike optimization-based methods, 4DGT performs purely feed-forward inference, reducing reconstruction time from hours to seconds and scaling effectively to long video sequences. Trained only on large-scale monocular posed video datasets, 4DGT can outperform prior Gaussian-based networks significantly in real-world videos and achieve on-par accuracy with optimization-based methods on cross-domain videos. Project page: https://4dgt.github.io

arxiv情報

著者	Zhen Xu,Zhengqin Li,Zhao Dong,Xiaowei Zhou,Richard Newcombe,Zhaoyang Lv
発行日	2025-06-09 17:59:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets

投稿日: 2025年6月10日作成者: jarxiv

要約

密な予測のためのマルチタスク学習は、すべてのタスクの広範な注釈の必要性によって制限されていますが、最近の作品は部分的なタスクラベルを使用したトレーニングを検討しています。
拡散モデルの一般化パワーを活用すると、部分学習セットアップをゼロショット設定に拡張し、複数の合成データセットでマルチタスクモデルをトレーニングします。各タスクのサブセットのみにラベル付けされます。
私たちの方法であるstablemtlは、潜在的な回帰のために画像ジェネレーターを再利用します。
タスクエンコード、タスクごとのコンディショニング、およびテーラードトレーニングスキームを使用して、除去フレームワークを適応させます。
慎重なバランスを必要とするタスクごとの損失の代わりに、統一された潜在的な損失が採用され、より多くのタスクにシームレスなスケーリングが可能になります。
タスク間の相乗効果を促進するために、N-to-Nタスクの相互作用を効率的な1対N注意に変換するタスクアテンションメカニズムを備えたマルチストリームモデルを導入し、効果的なクロスタスク共有を促進します。
StableMTLは、8つのベンチマークにわたる7つのタスクのベースラインを上回ります。

要約(オリジナル)

Multi-task learning for dense prediction is limited by the need for extensive annotation for every task, though recent works have explored training with partial task labels. Leveraging the generalization power of diffusion models, we extend the partial learning setup to a zero-shot setting, training a multi-task model on multiple synthetic datasets, each labeled for only a subset of tasks. Our method, StableMTL, repurposes image generators for latent regression. Adapting a denoising framework with task encoding, per-task conditioning and a tailored training scheme. Instead of per-task losses requiring careful balancing, a unified latent loss is adopted, enabling seamless scaling to more tasks. To encourage inter-task synergy, we introduce a multi-stream model with a task-attention mechanism that converts N-to-N task interactions into efficient 1-to-N attention, promoting effective cross-task sharing. StableMTL outperforms baselines on 7 tasks across 8 benchmarks.

arxiv情報

著者	Anh-Quan Cao,Ivan Lopes,Raoul de Charette
発行日	2025-06-09 17:59:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Distillation Robustifies Unlearning

投稿日: 2025年6月10日作成者: jarxiv

要約

現在のLLM学習方法は堅牢ではありません。それらは、微調整のいくつかのステップで簡単に戻すことができます。
これは、不要な情報に決してさらされなかったOracleモデルを模倣する理想的な非学習トレーニング方法にも当てはまります。これは、出力ベースのFinetuningが堅牢な解除を達成するには不十分であることを示唆しています。
同様に、ランダムに初期化された学生をトレーニングして、不明確なモデルを模倣して望ましい動作を模倣し、望ましくない機能を残していることがわかります。
言い換えれば、蒸留は学習を強化します。
この洞察に基づいて、私たちは、非学習されたモデルを部分的にノイズされたコピーに蒸留するスケーラブルな方法である、非ヌーシスティルオン出力（元に戻す）を提案します。
元に戻すと、コンピューティングコストと堅牢性の間に調整可能なトレードオフが導入され、合成言語と算術タスクに関する新しいパレートフロンティアが確立されます。
その最強の設定では、コンピューティングの60〜80％しか使用しない一方で、完全なデータフィルタリングを使用して、ゼロから再登録されたモデルの堅牢性と一致します。
また、よりリアルな大量破壊プロキシ（WMDP）ベンチマークを解き放つことを元に戻すことを示しています。
蒸留は実際には広く使用されているため、事前に学習していないステップを組み込むと、堅牢な機能除去への便利なパスが提供されます。

要約(オリジナル)

Current LLM unlearning methods are not robust: they can be reverted easily with a few steps of finetuning. This is true even for the idealized unlearning method of training to imitate an oracle model that was never exposed to unwanted information, suggesting that output-based finetuning is insufficient to achieve robust unlearning. In a similar vein, we find that training a randomly initialized student to imitate an unlearned model transfers desired behaviors while leaving undesired capabilities behind. In other words, distillation robustifies unlearning. Building on this insight, we propose Unlearn-Noise-Distill-on-Outputs (UNDO), a scalable method that distills an unlearned model into a partially noised copy of itself. UNDO introduces a tunable tradeoff between compute cost and robustness, establishing a new Pareto frontier on synthetic language and arithmetic tasks. At its strongest setting, UNDO matches the robustness of a model retrained from scratch with perfect data filtering while using only 60-80% of the compute and requiring only 0.01% of the pretraining data to be labeled. We also show that UNDO robustifies unlearning on the more realistic Weapons of Mass Destruction Proxy (WMDP) benchmark. Since distillation is widely used in practice, incorporating an unlearning step beforehand offers a convenient path to robust capability removal.

arxiv情報

著者	Bruce W. Lee,Addie Foote,Alex Infanger,Leni Shor,Harish Kamath,Jacob Goldman-Wetzler,Bryce Woodworth,Alex Cloud,Alexander Matt Turner
発行日	2025-06-09 17:28:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

MIRIAD: Augmenting LLMs with millions of medical query-response pairs

投稿日: 2025年6月10日作成者: jarxiv

要約

LLMは、高度な意思決定サポートと柔軟なチャットアシスタントでヘルスケアを変革することになります。
ただし、LLMは不正確な医療含有量を生成する傾向があります。
高品質の医療知識においてLLMSを接地するために、LLMSはRAGを介して外部知識を備えています。ここでは、非構造化された医療知識が選択的に取得され、LLMSコンテキストに統合される小さなテキストチャンクに分割されます。
しかし、既存のRAGパイプラインは、LLMSが効果的に活用するのが騒々しく、未確認で困難な生の非構造化された医療テキストに依存しています。
LLMに最適に表面化するために医学的知識を整理するための体系的なアプローチは、一般的に不足しています。
これらの課題に対処するために、5,821,948の医療QAペアの大規模でキュレーションされたコーパスであるMiriadを紹介します。それぞれが、LLM生成、フィルタリング、接地、および人間の発生を組み合わせた半自動パイプラインを使用して、ピアレビューされた医学文献からの通路から言い換えられ、その通路に基づいています。
構造化されていないテキストに依存している以前の医療コーポラとは異なり、Miriadは、よりターゲットを絞った検索を可能にする運用上のクエリ応答形式でWebスケールの医療知識をカプセル化します。
挑戦的な医療QAベンチマークの実験は、MiriadでLLMを増強すると、同じソースコーパスと同じ量の検索テキストを持つ非構造化されたRAGベースラインと比較して、精度が最大6.7％向上することが示されています。
さらに、Miriadは、LLMSが医療幻覚を22.5〜37％検出する能力を改善しました（F1スコアの増加）。
さらに、56の医学分野にまたがるMiriadのインタラクティブな地図であるMiriad-Atlasを紹介し、臨床ユーザーが医学的知識を視覚的に探索、検索、および改良できるようにします。
Miriadは、医療情報レトリバー、強化されたRAGアプリケーション、知識根拠のあるチャットインターフェイスなど、豊富なダウンストリームアプリケーションのロックを解除することを約束します。

要約(オリジナル)

LLMs are bound to transform healthcare with advanced decision support and flexible chat assistants. However, LLMs are prone to generate inaccurate medical content. To ground LLMs in high-quality medical knowledge, LLMs have been equipped with external knowledge via RAG, where unstructured medical knowledge is split into small text chunks that can be selectively retrieved and integrated into the LLMs context. Yet, existing RAG pipelines rely on raw, unstructured medical text, which can be noisy, uncurated and difficult for LLMs to effectively leverage. Systematic approaches to organize medical knowledge to best surface it to LLMs are generally lacking. To address these challenges, we introduce MIRIAD, a large-scale, curated corpus of 5,821,948 medical QA pairs, each rephrased from and grounded in a passage from peer-reviewed medical literature using a semi-automated pipeline combining LLM generation, filtering, grounding, and human annotation. Unlike prior medical corpora, which rely on unstructured text, MIRIAD encapsulates web-scale medical knowledge in an operationalized query-response format, which enables more targeted retrieval. Experiments on challenging medical QA benchmarks show that augmenting LLMs with MIRIAD improves accuracy up to 6.7% compared to unstructured RAG baselines with the same source corpus and with the same amount of retrieved text. Moreover, MIRIAD improved the ability of LLMs to detect medical hallucinations by 22.5 to 37% (increase in F1 score). We further introduce MIRIAD-Atlas, an interactive map of MIRIAD spanning 56 medical disciplines, enabling clinical users to visually explore, search, and refine medical knowledge. MIRIAD promises to unlock a wealth of down-stream applications, including medical information retrievers, enhanced RAG applications, and knowledge-grounded chat interfaces, which ultimately enables more reliable LLM applications in healthcare.

arxiv情報

著者	Qinyue Zheng,Salman Abdullah,Sam Rawal,Cyril Zakka,Sophie Ostmeier,Maximilian Purk,Eduardo Reis,Eric J. Topol,Jure Leskovec,Michael Moor
発行日	2025-06-09 14:21:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL, I.2.7 | コメントを受け付けていません

Text-to-LoRA: Instant Transformer Adaption

投稿日: 2025年6月10日作成者: jarxiv

要約

基礎モデルは、迅速なコンテンツ作成のための一般的なツールを提供しますが、タスク固有の適応を定期的に必要とします。
従来、この演習では、データセットの慎重なキュレーションと、基礎となるモデルの微調整を繰り返します。
微調整技術により、実践者は多くの新しいアプリケーションに基礎モデルを適応させることができますが、ハイパーパラメーターの選択に特に敏感なものであるが、高価で長いトレーニングが必要です。
これらの制限を克服するために、ターゲットタスクの自然言語の説明に基づいて、その場で大規模な言語モデル（LLM）を適応できるモデルであるText-to-Lora（T2L）を紹介します。
T2Lは、1回の安価なフォワードパスでLORAを構築するためにトレーニングされたハイパーネットワークです。
9つの事前に訓練されたLORAアダプター（GSM8K、ARCなど）のスイートでT2Lをトレーニングした後、アドホック再構築されたLORAインスタンスが、対応するテストセット全体でタスク固有のアダプターのパフォーマンスと一致することを示します。
さらに、T2Lは何百ものLORAインスタンスを圧縮し、ゼロショットが完全に見えないタスクに一般化できます。
このアプローチは、基礎モデルの専門化を民主化するための重要なステップを提供し、最小限の計算要件で言語ベースの適応を可能にします。
私たちのコードは、https：//github.com/sakanaai/text-to-loraで入手できます

要約(オリジナル)

While Foundation Models provide a general tool for rapid content creation, they regularly require task-specific adaptation. Traditionally, this exercise involves careful curation of datasets and repeated fine-tuning of the underlying model. Fine-tuning techniques enable practitioners to adapt foundation models for many new applications but require expensive and lengthy training while being notably sensitive to hyperparameter choices. To overcome these limitations, we introduce Text-to-LoRA (T2L), a model capable of adapting large language models (LLMs) on the fly solely based on a natural language description of the target task. T2L is a hypernetwork trained to construct LoRAs in a single inexpensive forward pass. After training T2L on a suite of 9 pre-trained LoRA adapters (GSM8K, Arc, etc.), we show that the ad-hoc reconstructed LoRA instances match the performance of task-specific adapters across the corresponding test sets. Furthermore, T2L can compress hundreds of LoRA instances and zero-shot generalize to entirely unseen tasks. This approach provides a significant step towards democratizing the specialization of foundation models and enables language-based adaptation with minimal compute requirements. Our code is available at https://github.com/SakanaAI/text-to-lora

arxiv情報

著者	Rujikorn Charakorn,Edoardo Cetin,Yujin Tang,Robert Tjarko Lange
発行日	2025-06-09 14:19:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

A Cognac Shot To Forget Bad Memories: Corrective Unlearning for Graph Neural Networks

投稿日: 2025年6月10日作成者: jarxiv

要約

グラフニューラルネットワーク（GNNS）は、グラフデータ上のさまざまなMLアプリケーションにますます使用されています。
グラフデータは独立して同一に分布した（I.I.D.）仮定に従わないため、敵対的な操作または誤ったデータは、メッセージの合格を通じて他のデータポイントに伝播する可能性があり、モデルのパフォーマンスが低下します。
モデル開発者が訓練されたGNNから操作されたエンティティの悪影響を除去できるようにするために、最近策定された是正済みの問題を研究しています。
現在のグラフの学習方法は、操作セット全体がわかっている場合でも、操作の効果を学ぶことができないことがわかります。
新しいグラフの未学習方法であるコニャックを導入します。コニャックは、識別された場合でも操作セットの効果を学ぶことができます。
完全に修正されたトレーニングデータを備えた強力なオラクルのパフォーマンスのほとんどを回復し、8倍の効率的である間、削除セットなしでゼロから再訓練を破りました。
私たちの仕事が、実世界のデータ、トレーニング後の問題によって引き起こされる有害な影響を緩和するGNN開発者を支援することを願っています。
当社のコードは、https://github.com/cognac-gnn-unlearning/corrective-unlearning-for-gnnsで公開されています

要約(オリジナル)

Graph Neural Networks (GNNs) are increasingly being used for a variety of ML applications on graph data. Because graph data does not follow the independently and identically distributed (i.i.d.) assumption, adversarial manipulations or incorrect data can propagate to other data points through message passing, which deteriorates the model’s performance. To allow model developers to remove the adverse effects of manipulated entities from a trained GNN, we study the recently formulated problem of Corrective Unlearning. We find that current graph unlearning methods fail to unlearn the effect of manipulations even when the whole manipulated set is known. We introduce a new graph unlearning method, Cognac, which can unlearn the effect of the manipulation set even when only 5% of it is identified. It recovers most of the performance of a strong oracle with fully corrected training data, even beating retraining from scratch without the deletion set while being 8x more efficient. We hope our work assists GNN developers in mitigating harmful effects caused by issues in real-world data, post-training. Our code is publicly available at https://github.com/cognac-gnn-unlearning/corrective-unlearning-for-gnns

arxiv情報

著者	Varshita Kolipaka,Akshit Sinha,Debangan Mishra,Sumit Kumar,Arvindh Arun,Shashwat Goel,Ponnurangam Kumaraguru
発行日	2025-06-09 15:05:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CR, cs.LG | コメントを受け付けていません

Fine-grained Hierarchical Crop Type Classification from Integrated Hyperspectral EnMAP Data and Multispectral Sentinel-2 Time Series: A Large-scale Dataset and Dual-stream Transformer Method

投稿日: 2025年6月10日作成者: jarxiv

要約

細粒の作物タイプの分類は、大規模な作物マッピングの基本的な基礎として機能し、食料安全保障を確保する上で重要な役割を果たします。
フェノロジーダイナミクス（Sentinel-2などの多様な衛星データから得られた）と微妙なスペクトル変動（ハイパースペクトル画像からナノメートルスケールスペクトル解像度を要求する）の両方の同時キャプチャが必要です。
これら2つのモダリティを組み合わせた研究は、ハイパースペクトルデータの収集と作物タイプの注釈コストの課題により、現在依然として不足しています。
これらの問題に対処するために、30m解像度のエンマップハイパースペクトルデータとSentinel-2時系列を統合することにより、階層型延期作物データセット（H2CROP）を構築します。
4層の作物分類法で組織された100万枚以上の注釈付きフィールド小包により、H2CROPは、きめ細かい農業作物分類とハイパースペクトル画像処理のための重要なベンチマークを確立します。
これらのモダリティを相乗的に処理するデュアルストリームトランスアーキテクチャを提案します。
2つの特殊な経路を調整します。スペクトル空間変圧器は、ハイパースペクトルエンマップデータから細粒のシグネチャを抽出し、一方、スウィントランスはセンチネル2時系列から作物の成長パターンを抽出します。
階層融合を備えた設計された階層分類ヘッドは、すべての分類層にマルチレベルの作物タイプの分類を同時に提供します。
実験では、ハイパースペクトルエンマップデータをSentinel-2時系列に追加すると、平均F1スコアが4.2％改善されることが示されています（6.3％でピークに達します）。
また、広範な比較は、作物タイプの分類のための既存の深い学習アプローチに対する方法のより高い精度と、さまざまな時間窓と作物の変化のシナリオにわたるハイパースペクトルデータの一貫した利点を確認します。
コードとデータセットはhttps://github.com/flyakon/h2cropで入手できます。

要約(オリジナル)

Fine-grained crop type classification serves as the fundamental basis for large-scale crop mapping and plays a vital role in ensuring food security. It requires simultaneous capture of both phenological dynamics (obtained from multi-temporal satellite data like Sentinel-2) and subtle spectral variations (demanding nanometer-scale spectral resolution from hyperspectral imagery). Research combining these two modalities remains scarce currently due to challenges in hyperspectral data acquisition and crop types annotation costs. To address these issues, we construct a hierarchical hyperspectral crop dataset (H2Crop) by integrating 30m-resolution EnMAP hyperspectral data with Sentinel-2 time series. With over one million annotated field parcels organized in a four-tier crop taxonomy, H2Crop establishes a vital benchmark for fine-grained agricultural crop classification and hyperspectral image processing. We propose a dual-stream Transformer architecture that synergistically processes these modalities. It coordinates two specialized pathways: a spectral-spatial Transformer extracts fine-grained signatures from hyperspectral EnMAP data, while a temporal Swin Transformer extracts crop growth patterns from Sentinel-2 time series. The designed hierarchical classification head with hierarchical fusion then simultaneously delivers multi-level crop type classification across all taxonomic tiers. Experiments demonstrate that adding hyperspectral EnMAP data to Sentinel-2 time series yields a 4.2% average F1-scores improvement (peaking at 6.3%). Extensive comparisons also confirm our method’s higher accuracy over existing deep learning approaches for crop type classification and the consistent benefits of hyperspectral data across varying temporal windows and crop change scenarios. Codes and dataset are available at https://github.com/flyakon/H2Crop.

arxiv情報

著者	Wenyuan Li,Shunlin Liang,Yuxiang Zhang,Liqin Liu,Keyan Chen,Yongzhe Chen,Han Ma,Jianglei Xu,Yichuan Ma,Shikang Guan,Zhenwei Shi
発行日	2025-06-09 14:30:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Cartridges: Lightweight and general-purpose long context representations via self-study

投稿日: 2025年6月10日作成者: jarxiv

要約

大規模な言語モデルは、コーパス全体をコンテキストウィンドウに配置し、コンテキスト内学習（ICL）を活用することにより、大きなテキストコーパス（コードベース、法的文書、チャット履歴など）に基づいたクエリに答えるためによく使用されます。
現在のモデルは100K-1Mトークンのコンテキストをサポートしていますが、KVキャッシュのメモリ消費量が入力長のメモリ消費量が拡大するため、このセットアップはサービスを提供するのに費用がかかります。
別の方法を探ります。各コーパスでより小さなKVキャッシュをオフラインでトレーニングします。
推論時に、この訓練されたKVキャッシュをロードします。これはカートリッジと呼ばれ、応答をデコードします。
重大なことに、カートリッジをトレーニングするコストは、同じコーパスを参照するすべてのクエリで償却できます。
ただし、コーパスで次のトークン予測でカートリッジをトレーニングする素朴なアプローチは、ICLと競合していないことがわかります。
代わりに、コーパスに関する合成会話を生成し、コンテキスト指向の目的でカートリッジを訓練するトレーニングレシピである自己学習を提案します。
自習で訓練されたカートリッジは、ICLの機能を複製する一方で、サービスを大幅に安くしていることがわかります。
挑戦的なロングコンテキストベンチマークでは、38.6倍のメモリを使用し、26.4倍のスループットを有効にしながら、自習マッチのICLパフォーマンスで訓練されたカートリッジ。
自己学習は、モデルの有効なコンテキスト長（例：MTOBの128Kから484Kトークンなど）を拡張し、驚くべきことに、再訓練なしで推論時に構成できるカートリッジにつながります。

要約(オリジナル)

Large language models are often used to answer queries grounded in large text corpora (e.g. codebases, legal documents, or chat histories) by placing the entire corpus in the context window and leveraging in-context learning (ICL). Although current models support contexts of 100K-1M tokens, this setup is costly to serve because the memory consumption of the KV cache scales with input length. We explore an alternative: training a smaller KV cache offline on each corpus. At inference time, we load this trained KV cache, which we call a Cartridge, and decode a response. Critically, the cost of training a Cartridge can be amortized across all the queries referencing the same corpus. However, we find that the naive approach of training the Cartridge with next-token prediction on the corpus is not competitive with ICL. Instead, we propose self-study, a training recipe in which we generate synthetic conversations about the corpus and train the Cartridge with a context-distillation objective. We find that Cartridges trained with self-study replicate the functionality of ICL, while being significantly cheaper to serve. On challenging long-context benchmarks, Cartridges trained with self-study match ICL performance while using 38.6x less memory and enabling 26.4x higher throughput. Self-study also extends the model’s effective context length (e.g. from 128k to 484k tokens on MTOB) and surprisingly, leads to Cartridges that can be composed at inference time without retraining.

arxiv情報

著者	Sabri Eyuboglu,Ryan Ehrlich,Simran Arora,Neel Guha,Dylan Zinsley,Emily Liu,Will Tennien,Atri Rudra,James Zou,Azalia Mirhoseini,Christopher Re
発行日	2025-06-09 05:21:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

Learning to Recover: Dynamic Reward Shaping with Wheel-Leg Coordination for Fallen Robots

投稿日: 2025年6月9日作成者: jarxiv

要約

秋のインシデントからの適応的回復は、車輪付きの足のロボットの実際の展開に不可欠なスキルであり、足の俊敏性と迅速な回復のためのホイールの速度を独自に組み合わせています。
ただし、事前に定められた回復動議、簡素化されたダイナミクス、またはスパースの報酬に依存している従来の方法は、しばしば堅牢な回復ポリシーを作成できません。
このペーパーでは、エピソードベースの動的報酬形状とカリキュラムの学習を統合する学習ベースのフレームワークを紹介します。これは、多様な回復操作と正確な姿勢の改良と動的にバランスをとります。
非対称のアクター批判的なアーキテクチャは、シミュレーションで特権情報を活用することによりトレーニングを加速しますが、ノイズ注入の観測は不確実性に対する堅牢性を高めます。
さらに、相乗効果ホイールレッグ調整により、関節のトルク消費が15.8％および26.2％減少し、エネルギー移動メカニズムを介して安定化が改善されることを実証します。
プラットフォーム固有のチューニングなしで、2つの異なる4倍のプラットフォームでの広範な評価は、最大99.1％と97.8％の回復成功率を達成します。
補足資料は、https：//boyuandeng.github.io/l2r-wheellegcoordination/で入手できます。

要約(オリジナル)

Adaptive recovery from fall incidents are essential skills for the practical deployment of wheeled-legged robots, which uniquely combine the agility of legs with the speed of wheels for rapid recovery. However, traditional methods relying on preplanned recovery motions, simplified dynamics or sparse rewards often fail to produce robust recovery policies. This paper presents a learning-based framework integrating Episode-based Dynamic Reward Shaping and curriculum learning, which dynamically balances exploration of diverse recovery maneuvers with precise posture refinement. An asymmetric actor-critic architecture accelerates training by leveraging privileged information in simulation, while noise-injected observations enhance robustness against uncertainties. We further demonstrate that synergistic wheel-leg coordination reduces joint torque consumption by 15.8% and 26.2% and improves stabilization through energy transfer mechanisms. Extensive evaluations on two distinct quadruped platforms achieve recovery success rates up to 99.1% and 97.8% without platform-specific tuning. The supplementary material is available at https://boyuandeng.github.io/L2R-WheelLegCoordination/

arxiv情報

著者	Boyuan Deng,Luca Rossini,Jin Wang,Weijie Wang,Nikolaos Tsagarakis
発行日	2025-06-05 18:58:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG, cs.RO | コメントを受け付けていません

An Integrated Visual Servoing Framework for Precise Robotic Pruning Operations in Modern Commercial Orchard

投稿日: 2025年6月9日作成者: jarxiv

要約

この研究では、自動化された果樹枝の剪定アプリケーション用の視覚誘導ロボット制御システムを提示します。
従来の剪定慣行は労働集約的であり、農業効率とスケーラビリティを制限し、高度な自動化の必要性を強調しています。
重要な課題は、複雑な果樹園環境における切削工具の正確で堅牢な位置付けであり、密な枝と閉塞がターゲットアクセスを困難にすることです。
これに対処するために、Intel RealSense D435カメラがUR5Eロボットアームのフランジに取り付けられており、トランスベースのポイントトラッカーであるCoTracker3は、カメラビューのポイントを追跡する視覚サーボ制御に使用されます。
このシステムは、比例制御を反復的な逆運動学と統合し、正確なエンド効果の位置付けを実現します。
このシステムはガゼボシミュレーションで検証され、5mmの位置許容度内で77.77％の成功率と10mm許容範囲内で100％の成功率を達成し、平均終末効果誤差は4.28 +/- 1.36 mmです。
ビジョンコントローラーは、ピクセルワークスペース内の多様なターゲット位置にわたって堅牢なパフォーマンスを実証しました。
結果は、視覚ベースの追跡を統合することの有効性を、精密な農業タスクの運動学的制御と統合します。
将来の作業では、実際の実装と、実際の切断操作のためのフォースセンシングの統合に焦点を当てます。

要約(オリジナル)

This study presents a vision-guided robotic control system for automated fruit tree pruning applications. Traditional pruning practices are labor-intensive and limit agricultural efficiency and scalability, highlighting the need for advanced automation. A key challenge is the precise, robust positioning of the cutting tool in complex orchard environments, where dense branches and occlusions make target access difficult. To address this, an Intel RealSense D435 camera is mounted on the flange of a UR5e robotic arm and CoTracker3, a transformer-based point tracker, is utilized for visual servoing control that centers tracked points in the camera view. The system integrates proportional control with iterative inverse kinematics to achieve precise end-effector positioning. The system was validated in Gazebo simulation, achieving a 77.77% success rate within 5mm positional tolerance and 100% success rate within 10mm tolerance, with a mean end-effector error of 4.28 +/- 1.36 mm. The vision controller demonstrated robust performance across diverse target positions within the pixel workspace. The results validate the effectiveness of integrating vision-based tracking with kinematic control for precision agricultural tasks. Future work will focus on real-world implementation and the integration of force sensing for actual cutting operations.

arxiv情報

著者	Dawood Ahmed,Basit Muhammad Imran,Martin Churuvija,Manoj Karkee
発行日	2025-06-05 19:01:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.RO | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント