jarxiv | Japanese arxiv | ページ 1176

Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark

投稿日: 2025年3月27日作成者: jarxiv

要約

大規模な言語モデル（LLMS）の急速な進歩により、オンバイスAIアプリケーション向けにモバイルデバイスに展開することに関心が高まっています。
モバイルユーザーは、デスクトップユーザーと比較してLLMとは異なって相互作用し、独自の期待とデータバイアスを作成します。
現在のベンチマークデータセットは、主にサーバー環境とデスクトップ環境をターゲットにしており、モバイルコンテキスト向けに特別に設計された広範なデータセットが顕著に不足しています。
さらに、モバイルデバイスは、ストレージおよびコンピューティングリソースの厳格な制限に直面し、モデルのサイズと機能を制約するため、最適化された効率と優先知識が必要です。
これらの課題に対処するために、モバイルインテリジェンスに合わせた大規模なベンチマークデータセットであるMobile-MMLUを紹介します。
これは、現実的なモバイルシナリオでLLMパフォーマンスを評価するために設計された80のモバイル関連フィールドにわたる16,186の質問で構成されています。
挑戦的なサブセットであるMobile-MMLU-Proは、MMLU-Proと同様の高度な評価を提供しますが、標準のフルセットよりもはるかに困難です。
両方のベンチマークは、レシピの提案、旅行計画、不可欠な毎日のタスクなど、実用的なモバイルインタラクションに焦点を当てた複数選択、注文不変の質問を使用しています。
データセットは、推論潜時、エネルギー消費、メモリ使用量、応答品質などの重要なモバイル固有のメトリックを強調し、モバイル制約の下でのモデルパフォーマンスに関する包括的な洞察を提供します。
さらに、プライバシーと適応性を優先し、デバイス上の処理を実行し、ユーザーのプライバシーを維持し、パーソナライズされた使用パターンに適応するモデルの能力を評価します。
Mobile-MMLUファミリーは、モバイル最適化されたLLMを開発および比較するための標準化されたフレームワークを提供し、モバイルコンピューティング環境での生産性と意思決定の進歩を可能にします。
コードとデータは、https：//github.com/vila-lab/mobile-mmluで入手できます。

要約(オリジナル)

Rapid advancements in large language models (LLMs) have increased interest in deploying them on mobile devices for on-device AI applications. Mobile users interact differently with LLMs compared to desktop users, creating unique expectations and data biases. Current benchmark datasets primarily target at server and desktop environments, and there is a notable lack of extensive datasets specifically designed for mobile contexts. Additionally, mobile devices face strict limitations in storage and computing resources, constraining model size and capabilities, thus requiring optimized efficiency and prioritized knowledge. To address these challenges, we introduce Mobile-MMLU, a large-scale benchmark dataset tailored for mobile intelligence. It consists of 16,186 questions across 80 mobile-related fields, designed to evaluate LLM performance in realistic mobile scenarios. A challenging subset, Mobile-MMLU-Pro, provides advanced evaluation similar in size to MMLU-Pro but significantly more difficult than our standard full set. Both benchmarks use multiple-choice, order-invariant questions focused on practical mobile interactions, such as recipe suggestions, travel planning, and essential daily tasks. The dataset emphasizes critical mobile-specific metrics like inference latency, energy consumption, memory usage, and response quality, offering comprehensive insights into model performance under mobile constraints. Moreover, it prioritizes privacy and adaptability, assessing models’ ability to perform on-device processing, maintain user privacy, and adapt to personalized usage patterns. Mobile-MMLU family offers a standardized framework for developing and comparing mobile-optimized LLMs, enabling advancements in productivity and decision-making within mobile computing environments. Our code and data are available at: https://github.com/VILA-Lab/Mobile-MMLU.

arxiv情報

著者	Sondos Mahmoud Bsharat,Mukul Ranjan,Aidar Myrzakhan,Jiacheng Liu,Bowei Guo,Shengkun Tang,Zhuang Liu,Yuanzhi Li,Zhiqiang Shen
発行日	2025-03-26 17:59:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

Diffusion Counterfactuals for Image Regressors

投稿日: 2025年3月27日作成者: jarxiv

要約

反事実的な説明は、さまざまなブラックボックスモデルの人間の解釈可能な説明を作成するために成功裏に適用されています。
これらは、画像ドメインのタスクに便利です。このタスクでは、説明の品質は、生成モデルの最近の進歩から利益をもたらします。
反事実的な説明は分類モデルに広く適用されてきましたが、回帰タスクへの適用は未脱カタリングのままです。
拡散ベースの生成モデルを使用して画像回帰タスクの反事実的な説明を作成して、スパースと品質の課題に対処するための2つの方法を提示します。1）1つはピクセル空間で直接動作する拡散確率モデルに基づいています。
どちらも、Celeba-HQで現実的でセマンティックでスムーズな反事実を生成し、合成データセットを生み出し、回帰モデルの意思決定プロセスに関する簡単に解釈可能な洞察を提供し、偽の相関を明らかにします。
回帰反事実の場合、機能の変化は予測値の領域に依存することがわかります。
予測値の大幅な変化には大きなセマンティックの変更が必要であり、分類子よりもまばらな反事実を見つけるのが難しくなります。
さらに、ピクセル空間反事実はよりまばらであり、潜在的な空間の事実はより高い品質であり、より大きなセマンティックの変化を可能にします。

要約(オリジナル)

Counterfactual explanations have been successfully applied to create human interpretable explanations for various black-box models. They are handy for tasks in the image domain, where the quality of the explanations benefits from recent advances in generative models. Although counterfactual explanations have been widely applied to classification models, their application to regression tasks remains underexplored. We present two methods to create counterfactual explanations for image regression tasks using diffusion-based generative models to address challenges in sparsity and quality: 1) one based on a Denoising Diffusion Probabilistic Model that operates directly in pixel-space and 2) another based on a Diffusion Autoencoder operating in latent space. Both produce realistic, semantic, and smooth counterfactuals on CelebA-HQ and a synthetic data set, providing easily interpretable insights into the decision-making process of the regression model and reveal spurious correlations. We find that for regression counterfactuals, changes in features depend on the region of the predicted value. Large semantic changes are needed for significant changes in predicted values, making it harder to find sparse counterfactuals than with classifiers. Moreover, pixel space counterfactuals are more sparse while latent space counterfactuals are of higher quality and allow bigger semantic changes.

arxiv情報

著者	Trung Duc Ha,Sidney Bender
発行日	2025-03-26 14:42:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.LG, stat.ML | コメントを受け付けていません

Comparison of marker-less 2D image-based methods for infant pose estimation

投稿日: 2025年3月27日作成者: jarxiv

要約

この研究では、ビデオベースの自動化された一般運動評価（GMA）で利用可能なジェネリックおよび乳児ポーズ推定量のパフォーマンスと、最適な記録のための視聴角、つまりGMA対トップダウンビューで使用される従来の対角線ビューの選択を比較します。
4週間から26週間の乳児自発運動機能の75の録音から4500の注釈付きビデオフレームを使用しました。
どのポーズ推定方法とカメラ角度を決定するために、GMA関連の設定で乳児に最適なポーズ推定精度を生成するために、人間の注釈への距離と正しいキーポイント（PCK）の割合を計算して比較しました。
結果は、大人であるVitsposeで訓練された最高のパフォーマンスの一般的なモデルも、乳児に最適であることを示しています。
幼児データセットの一般的なポーズ推定器よりも乳児ポーズ推定器を使用することで改善は見られません。
ただし、データ上の一般的なモデルを再調整すると、ポーズ推定精度が大幅に改善されます。
トップダウンビューから得られたポーズ推定精度は、特に股関節キーポイントの検出のために、対角線ビューから得られたものよりも大幅に優れています。
また、この結果は、乳児ポーズ推定器の限られた一般化能力が他の乳児データセットに限られていることを示しています。
標準のGMAメソッドは評価に対角線ビューを使用しますが、推定精度のポーズは、トップダウンビューを使用して大幅に向上します。
これは、自動化されたGMA研究のための録音セットアップにトップダウンビューを含める必要があることを示唆しています。

要約(オリジナル)

In this study we compare the performance of available generic- and infant-pose estimators for a video-based automated general movement assessment (GMA), and the choice of viewing angle for optimal recordings, i.e., conventional diagonal view used in GMA vs. top-down view. We used 4500 annotated video-frames from 75 recordings of infant spontaneous motor functions from 4 to 26 weeks. To determine which pose estimation method and camera angle yield the best pose estimation accuracy on infants in a GMA related setting, the distance to human annotations and the percentage of correct key-points (PCK) were computed and compared. The results show that the best performing generic model trained on adults, ViTPose, also performs best on infants. We see no improvement from using infant-pose estimators over the generic pose estimators on our infant dataset. However, when retraining a generic model on our data, there is a significant improvement in pose estimation accuracy. The pose estimation accuracy obtained from the top-down view is significantly better than that obtained from the diagonal view, especially for the detection of the hip key-points. The results also indicate limited generalization capabilities of infant-pose estimators to other infant datasets, which hints that one should be careful when choosing infant pose estimators and using them on infant datasets which they were not trained on. While the standard GMA method uses a diagonal view for assessment, pose estimation accuracy significantly improves using a top-down view. This suggests that a top-down view should be included in recording setups for automated GMA research.

arxiv情報

著者	Lennart Jahn,Sarah Flügge,Dajie Zhang,Luise Poustka,Sven Bölte,Florentin Wörgötter,Peter B Marschik,Tomas Kulvicius
発行日	2025-03-26 14:45:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

IAP: Improving Continual Learning of Vision-Language Models via Instance-Aware Prompting

投稿日: 2025年3月27日作成者: jarxiv

要約

最近の訓練を受けた視覚言語モデル（PT-VLMS）は、実際にはマルチドメインのクラスインクリメンタル学習（MCIL）シナリオに直面していることが多く、マルチモーダルタスクのいくつかのクラスとドメインが段階的に到着します。
以前に学んだタスクや目に見えないタスクにアクセスしないと、メモリに制約のあるMCILは、前方および後方の忘却に苦しんでいます。
上記の課題を軽減するために、PT-VLMを多様な段階的に学習したタスクに適応するために、迅速な調整などのパラメーター効率の高い微調整技術（PEFT）が採用されています。
効果的な新しいタスク適応を実現するために、既存の方法はPEFT戦略選択の効果のみを考慮しますが、PEFTパラメーター設定の影響を無視します（例：プロンプト）。
この論文では、MCILの多様なタスクの迅速な設計を最適化するという課題に取り組み、インスタンス認識プロンプト（IAP）フレームワークを提案します。
具体的には、インスタンスが認識しているゲートプロンプト（IA-GP）モジュールは、インスタンスレベルでトランスレイヤー全体にプロンプトを動的に割り当てることにより、忘れを軽減しながら、新しいタスクへの適応を強化します。
インスタンス認識クラスディストリビューション駆動型プロンプト（IA-CDDP）は、各インスタンスの正確なタスクラベル関連の信頼性スコアを決定することにより、タスク適応プロセスを改善します。
3つのパフォーマンスメトリックを使用した11のデータセットにわたる実験的評価は、提案された方法の有効性を示しています。
コードはhttps://github.com/ferdinandzju/iapにあります。

要約(オリジナル)

Recent pre-trained vision-language models (PT-VLMs) often face a Multi-Domain Class-Incremental Learning (MCIL) scenario in practice, where several classes and domains of multi-modal tasks are incrementally arrived. Without access to previously learned tasks and unseen tasks, memory-constrained MCIL suffers from forward and backward forgetting. To alleviate the above challenges, parameter-efficient fine-tuning techniques (PEFT), such as prompt tuning, are employed to adapt the PT-VLM to the diverse incrementally learned tasks. To achieve effective new task adaptation, existing methods only consider the effect of PEFT strategy selection, but neglect the influence of PEFT parameter setting (e.g., prompting). In this paper, we tackle the challenge of optimizing prompt designs for diverse tasks in MCIL and propose an Instance-Aware Prompting (IAP) framework. Specifically, our Instance-Aware Gated Prompting (IA-GP) module enhances adaptation to new tasks while mitigating forgetting by dynamically assigning prompts across transformer layers at the instance level. Our Instance-Aware Class-Distribution-Driven Prompting (IA-CDDP) improves the task adaptation process by determining an accurate task-label-related confidence score for each instance. Experimental evaluations across 11 datasets, using three performance metrics, demonstrate the effectiveness of our proposed method. Code can be found at https://github.com/FerdinandZJU/IAP.

arxiv情報

著者	Hao Fu,Hanbin Zhao,Jiahua Dong,Chao Zhang,Hui Qian
発行日	2025-03-26 14:59:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

Unleashing Vecset Diffusion Model for Fast Shape Generation

投稿日: 2025年3月27日作成者: jarxiv

要約

3D形状生成は、特にVECSET拡散モデル（VDM）を通じて、いわゆる「ネイティブ」3D拡散の開発を通じて大幅に栄えました。
最近の進歩により、高解像度の3D形状の生成において有望な結果が示されていますが、VDMは依然として高速生成に苦労しています。
拡散サンプリングの加速だけでなく、VDMでのVAEデコードが不足しているため、以前の作品で採点されていない領域でも、課題が存在します。
これらの課題に対処するために、VDMでVAEとDITの両方を加速するための体系的なフレームワークであるFlashVDMを提示します。
DITの場合、FlashVDMは、5つの5つの推論ステップと同等の品質を備えた柔軟な拡散サンプリングを有効にします。これは、新しく導入された進行性の蒸留との一貫性の蒸留を安定化することで可能になります。
VAEについては、適応性のあるKV選択、階層ボリュームデコード、効率的なネットワーク設計を備えたLightning Vecsetデコーダーを導入します。
ボリュームのベクセットの局所性と形状表面のスパース性を活用することにより、デコーダーはフロップを大幅に低下させ、全体的なデコードオーバーヘッドを最小限に抑えます。
FlashVDMをHunyuan3D-2に適用して、Hunyuan3D-2ターボを取得します。
体系的な評価を通じて、私たちのモデルは既存の高速3D生成方法を大幅に上回り、最先端に同等のパフォーマンスを達成しながら、推論時間を再建のために45倍以上、世代のために32xを削減することを示します。
コードとモデルはhttps://github.com/tencent/flashvdmで入手できます。

要約(オリジナル)

3D shape generation has greatly flourished through the development of so-called ‘native’ 3D diffusion, particularly through the Vecset Diffusion Model (VDM). While recent advancements have shown promising results in generating high-resolution 3D shapes, VDM still struggles with high-speed generation. Challenges exist because of difficulties not only in accelerating diffusion sampling but also VAE decoding in VDM, areas under-explored in previous works. To address these challenges, we present FlashVDM, a systematic framework for accelerating both VAE and DiT in VDM. For DiT, FlashVDM enables flexible diffusion sampling with as few as 5 inference steps and comparable quality, which is made possible by stabilizing consistency distillation with our newly introduced Progressive Flow Distillation. For VAE, we introduce a lightning vecset decoder equipped with Adaptive KV Selection, Hierarchical Volume Decoding, and Efficient Network Design. By exploiting the locality of the vecset and the sparsity of shape surface in the volume, our decoder drastically lowers FLOPs, minimizing the overall decoding overhead. We apply FlashVDM to Hunyuan3D-2 to obtain Hunyuan3D-2 Turbo. Through systematic evaluation, we show that our model significantly outperforms existing fast 3D generation methods, achieving comparable performance to the state-of-the-art while reducing inference time by over 45x for reconstruction and 32x for generation. Code and models are available at https://github.com/Tencent/FlashVDM.

arxiv情報

著者	Zeqiang Lai,Yunfei Zhao,Zibo Zhao,Haolin Liu,Fuyun Wang,Huiwen Shi,Xianghui Yang,Qingxiang Lin,Jingwei Huang,Yuhong Liu,Jie Jiang,Chunchao Guo,Xiangyu Yue
発行日	2025-03-26 15:08:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CV, eess.IV | コメントを受け付けていません

Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models

投稿日: 2025年3月27日作成者: jarxiv

要約

潜在的な拡散モデル（LDMS）の生成プロセスに透かしを統合すると、生成されたコンテンツの検出と帰属が簡素化されます。
ツリーリングやガウスシェーディングなどのセマンティックの透かしは、実装しやすく、さまざまな摂動に対して非常に堅牢な透かしのテクニックの新しいクラスを表しています。
しかし、私たちの仕事は、セマンティック透かしの基本的なセキュリティの脆弱性を示しています。
攻撃者は、異なる潜在スペースやアーキテクチャ（UNET対DIT）がある場合でも、無関係なモデルを活用して、強力で現実的な偽造攻撃を実行できることを示しています。
具体的には、2つの透かしの偽造攻撃を設計します。
最初のものは、透けていないLDMの任意の画像の潜在的な表現を操作して、透かし式画像の潜在的な表現に近づくことにより、実際の画像にターゲットを絞った透かしを刻印します。
また、この手法を透かし除去に使用できることも示しています。
2番目の攻撃は、透かし式画像を反転させ、任意のプロンプトで再生することにより、ターゲットウォーターマークで新しい画像を生成します。
どちらの攻撃でも、ターゲットウォーターマークを備えた単一の参照画像が必要です。
全体として、私たちの調査結果は、攻撃者が現実的な条件下でこれらの透かしを簡単に築き、削除できることを明らかにすることにより、セマンティック透かしの適用性に疑問を呈しています。

要約(オリジナル)

Integrating watermarking into the generation process of latent diffusion models (LDMs) simplifies detection and attribution of generated content. Semantic watermarks, such as Tree-Rings and Gaussian Shading, represent a novel class of watermarking techniques that are easy to implement and highly robust against various perturbations. However, our work demonstrates a fundamental security vulnerability of semantic watermarks. We show that attackers can leverage unrelated models, even with different latent spaces and architectures (UNet vs DiT), to perform powerful and realistic forgery attacks. Specifically, we design two watermark forgery attacks. The first imprints a targeted watermark into real images by manipulating the latent representation of an arbitrary image in an unrelated LDM to get closer to the latent representation of a watermarked image. We also show that this technique can be used for watermark removal. The second attack generates new images with the target watermark by inverting a watermarked image and re-generating it with an arbitrary prompt. Both attacks just need a single reference image with the target watermark. Overall, our findings question the applicability of semantic watermarks by revealing that attackers can easily forge or remove these watermarks under realistic conditions.

arxiv情報

著者	Andreas Müller,Denis Lukovnikov,Jonas Thietke,Asja Fischer,Erwin Quiring
発行日	2025-03-26 15:10:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CR, cs.CV | コメントを受け付けていません

DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering

投稿日: 2025年3月27日作成者: jarxiv

要約

ガウスのスプラッティングにより、静的3D環境での迅速な新規ビューの合成が可能になります。
ただし、配偶者や咬合器が正確な3D再構成に必要なマルチビューの一貫性の仮定を破壊するため、実際の環境の再構築は困難なままです。
ほとんどの既存の方法は、事前に訓練されたモデルからの外部セマンティック情報に依存しており、追加の計算オーバーヘッドを前処理ステップまたは最適化中に導入します。
この作業では、ガウスプリミティブのボリュームレンダリングに基づいて純粋にディストラクタと静的シーン要素を直接分離する新しい方法であるデスプラを提案します。
ビュー固有のディストラクタを再構築するために、各カメラビュー内のガウスの初期化を初期化して、アルファ構成段階の静的3Dシーンとディストラクタを個別にモデル化します。
Desplatは、静的要素とディストラクタの明示的なシーン分離をもたらし、レンダリング速度を犠牲にすることなく、以前のディストラクタフリーのアプローチに匹敵する結果を達成します。
ディストラクタを含まない新規ビューの合成のための3つのベンチマークデータセットでDesplatの有効性を示します。
プロジェクトWebサイトhttps://aaltoml.github.io/desplat/を参照してください。

要約(オリジナル)

Gaussian splatting enables fast novel view synthesis in static 3D environments. However, reconstructing real-world environments remains challenging as distractors or occluders break the multi-view consistency assumption required for accurate 3D reconstruction. Most existing methods rely on external semantic information from pre-trained models, introducing additional computational overhead as pre-processing steps or during optimization. In this work, we propose a novel method, DeSplat, that directly separates distractors and static scene elements purely based on volume rendering of Gaussian primitives. We initialize Gaussians within each camera view for reconstructing the view-specific distractors to separately model the static 3D scene and distractors in the alpha compositing stages. DeSplat yields an explicit scene separation of static elements and distractors, achieving comparable results to prior distractor-free approaches without sacrificing rendering speed. We demonstrate DeSplat’s effectiveness on three benchmark data sets for distractor-free novel view synthesis. See the project website at https://aaltoml.github.io/desplat/.

arxiv情報

著者	Yihao Wang,Marcus Klasson,Matias Turkulainen,Shuzhe Wang,Juho Kannala,Arno Solin
発行日	2025-03-26 15:13:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

4DRGS: 4D Radiative Gaussian Splatting for Efficient 3D Vessel Reconstruction from Sparse-View Dynamic DSA Images

投稿日: 2025年3月27日作成者: jarxiv

要約

スパースビューからの3D容器構造を再構築するダイナミックデジタル減算（DSA）画像は、放射線曝露を減らしながら正確な医学的評価を可能にします。
既存の方法は、多くの場合、最適ではない結果を生成するか、過度の計算時間を必要とします。
この作業では、高品質の再構成を効率的に達成するために、4D放射ガウススプラッティング（4DRG）を提案します。
詳細には、4D放射ガウス核を持つ容器を表します。
各カーネルには、静的血管構造をモデル化するために、位置、回転、スケールなどの時間不変のジオメトリパラメーターがあります。
各カーネルの時間依存性中心減衰は、コンパクトなニューラルネットワークから予測され、造影剤の流れの一時的な反応をキャプチャします。
これらのガウスカーネルをX線ラスター化を介してDSA画像を合成し、実際のキャプチャされたものでモデルを最適化します。
最終的な3D容器容積は、よく訓練されたカーネルからボクセル化されています。
さらに、蓄積された減衰剪定と境界スケーリングの活性化を導入して、再構成の品質を改善します。
実際の患者データに関する広範な実験は、4DRGが5分間のトレーニングで印象的な結果を達成することを示しています。これは、最先端の方法よりも32倍高速です。
これは、実際のクリニックの4DRGの可能性を強調しています。

要約(オリジナル)

Reconstructing 3D vessel structures from sparse-view dynamic digital subtraction angiography (DSA) images enables accurate medical assessment while reducing radiation exposure. Existing methods often produce suboptimal results or require excessive computation time. In this work, we propose 4D radiative Gaussian splatting (4DRGS) to achieve high-quality reconstruction efficiently. In detail, we represent the vessels with 4D radiative Gaussian kernels. Each kernel has time-invariant geometry parameters, including position, rotation, and scale, to model static vessel structures. The time-dependent central attenuation of each kernel is predicted from a compact neural network to capture the temporal varying response of contrast agent flow. We splat these Gaussian kernels to synthesize DSA images via X-ray rasterization and optimize the model with real captured ones. The final 3D vessel volume is voxelized from the well-trained kernels. Moreover, we introduce accumulated attenuation pruning and bounded scaling activation to improve reconstruction quality. Extensive experiments on real-world patient data demonstrate that 4DRGS achieves impressive results in 5 minutes training, which is 32x faster than the state-of-the-art method. This underscores the potential of 4DRGS for real-world clinics.

arxiv情報

著者	Zhentao Liu,Ruyi Zha,Huangxuan Zhao,Hongdong Li,Zhiming Cui
発行日	2025-03-26 15:14:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, eess.IV | コメントを受け付けていません

Intuitive Axial Augmentation Using Polar-Sine-Based Piecewise Distortion for Medical Slice-Wise Segmentation

投稿日: 2025年3月27日作成者: jarxiv

要約

医療画像分析のためのほとんどのデータ駆動型モデルは、精度を向上させるために普遍的な増強に依存しています。
実験的証拠はそれらの有効性を確認しましたが、それらの根底にある不明確なメカニズムは、医学界内のそのような方法に対する広範な受容と信頼に対する障壁をもたらします。
従来のデジタル画像とは別に医療画像のユニークな特性を再検討し、認め、その結果、より弾力性があり、放射線学スキャン手順とよく整合する医療固有の増強アルゴリズムを提案しました。
このメソッドは、極座標の半径に従って正弦波歪んだ光線との区分的な結合を実行するため、スキャンテーブルの上に平らに横たわっている人間の不確実な姿勢をシミュレートします。
私たちの方法は、軸平面上の基本的な相対位置に影響を与えることなく、人間の内臓分布を生成する可能性があります。
2つの非適応アルゴリズム、つまりメタベースのスキャンテーブルの削除と類似性ガイド付きパラメーター検索が、増強法の堅牢性を強化するために導入されます。
他の方法論とは対照的に、私たちの方法は、医療専門家にとって直感的な設計と理解の容易さで強調されており、それによって臨床シナリオでの適用性を高めます。
実験は、より多くのデータサンプルを必要とせずに、複数の有名なセグメンテーションフレームワークにわたって2つのモダリティで精度を向上させることを示しています。
プレビューコードは、https：//github.com/mgamz/psbpdで入手できます。

要約(オリジナル)

Most data-driven models for medical image analysis rely on universal augmentations to improve accuracy. Experimental evidence has confirmed their effectiveness, but the unclear mechanism underlying them poses a barrier to the widespread acceptance and trust in such methods within the medical community. We revisit and acknowledge the unique characteristics of medical images apart from traditional digital images, and consequently, proposed a medical-specific augmentation algorithm that is more elastic and aligns well with radiology scan procedure. The method performs piecewise affine with sinusoidal distorted ray according to radius on polar coordinates, thus simulating uncertain postures of human lying flat on the scanning table. Our method could generate human visceral distribution without affecting the fundamental relative position on axial plane. Two non-adaptive algorithms, namely Meta-based Scan Table Removal and Similarity-Guided Parameter Search, are introduced to bolster robustness of our augmentation method. In contrast to other methodologies, our method is highlighted for its intuitive design and ease of understanding for medical professionals, thereby enhancing its applicability in clinical scenarios. Experiments show our method improves accuracy with two modality across multiple famous segmentation frameworks without requiring more data samples. Our preview code is available in: https://github.com/MGAMZ/PSBPD.

arxiv情報

著者	Yiqin Zhang,Qingkui Chen,Chen Huang,Zhengjie Zhang,Meiling Chen,Zhibing Fu
発行日	2025-03-26 15:19:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Robust Flower Cluster Matching Using The Unscented Transform

投稿日: 2025年3月27日作成者: jarxiv

要約

時間の経過とともに花を監視することは、農業における精密ロボット受粉に不可欠です。
これを達成するために、固定RGB-Dカメラを使用して、植物の成長の継続的な空間的観察を行うことができます。
ただし、受粉プロセスによって引き起こされる植物の視覚的外観と成長とカメラの角度からの閉塞の変化により、画像登録は深刻な課題になります。
植物は、枝に明確なクラスターを生成する方法で花を咲かせます。
このホワイトペーパーでは、RGB-Dデータから生成された記述子を使用して花のクラスターを一致させ、クラスター内の空間的不確実性を可能にすることを検討する方法を紹介します。
提案されたアプローチは、施設の記述子の不確実性耐性を効率的に推定するために、無香料の変換を活用し、時間的変化にもかかわらず堅牢な画像登録プロセスを可能にします。
香りのない変換は、花の位置の不確実性を伝播して記述子ドメインの変動を決定することにより、非線形変換を処理するために使用されます。
モンテカルロシミュレーションを使用して、無香料の変換結果を検証し、フラワークラスターマッチングに対する方法の有効性を確認します。
したがって、動的環境でのロボット受粉の改善を促進できます。

要約(オリジナル)

Monitoring flowers over time is essential for precision robotic pollination in agriculture. To accomplish this, a continuous spatial-temporal observation of plant growth can be done using stationary RGB-D cameras. However, image registration becomes a serious challenge due to changes in the visual appearance of the plant caused by the pollination process and occlusions from growth and camera angles. Plants flower in a manner that produces distinct clusters on branches. This paper presents a method for matching flower clusters using descriptors generated from RGB-D data and considers allowing for spatial uncertainty within the cluster. The proposed approach leverages the Unscented Transform to efficiently estimate plant descriptor uncertainty tolerances, enabling a robust image-registration process despite temporal changes. The Unscented Transform is used to handle the nonlinear transformations by propagating the uncertainty of flower positions to determine the variations in the descriptor domain. A Monte Carlo simulation is used to validate the Unscented Transform results, confirming our method’s effectiveness for flower cluster matching. Therefore, it can facilitate improved robotics pollination in dynamic environments.

arxiv情報

著者	Andy Chu,Rashik Shrestha,Yu Gu,Jason N. Gross
発行日	2025-03-26 15:24:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント