jarxiv | Japanese arxiv | ページ 546

Robust Kidney Abnormality Segmentation: A Validation Study of an AI-Based Framework

投稿日: 2025年5月13日作成者: jarxiv

要約

腎臓の異常のセグメンテーションは、特に定量的評価を必要とする設定で、臨床ワークフローを強化する重要な可能性があります。
腎臓の体積は、腎疾患の重要なバイオマーカーとして機能し、体積の変化は腎機能と直接相関しています。
現在、臨床診療はしばしば、腫瘍や嚢胞を含む腎臓のサイズと異常を評価するための主観的な視覚評価に依存しており、通常は直径、体積、解剖学的位置に基づいてステージングされています。
より客観的で再現性のあるアプローチをサポートするために、この研究は、臨床および研究の使用に合わせて公開された、堅牢で徹底的に検証された腎臓異常セグメンテーションアルゴリズムを開発することを目的としています。
公開されているトレーニングデータセットを採用し、最先端の医療画像セグメンテーションフレームワークNNU-NETを活用しています。
検証は、独自およびパブリックテストデータセットの両方を使用して実施され、セグメンテーションパフォーマンスは、サイコロ係数と95パーセンタイルのhausdorff距離によって定量化されます。
さらに、患者の性別、年齢、CTコントラスト段階、および腫瘍の組織学的サブタイプに基づいて、サブグループ間の堅牢性を分析します。
私たちの調査結果は、公開されているデータのみで訓練されたセグメンテーションアルゴリズムが、外部テストセットに効果的に一般化し、すべてのテストされたデータセットで既存の最先端モデルを上回ることを示しています。
サブグループ分析は、一貫した高性能を明らかにし、強い堅牢性と信頼性を示しています。
開発されたアルゴリズムと関連するコードは、https：//github.com/diagnijmegen/oncology-kidney-abnormality-segmentationで公開されます。

要約(オリジナル)

Kidney abnormality segmentation has important potential to enhance the clinical workflow, especially in settings requiring quantitative assessments. Kidney volume could serve as an important biomarker for renal diseases, with changes in volume correlating directly with kidney function. Currently, clinical practice often relies on subjective visual assessment for evaluating kidney size and abnormalities, including tumors and cysts, which are typically staged based on diameter, volume, and anatomical location. To support a more objective and reproducible approach, this research aims to develop a robust, thoroughly validated kidney abnormality segmentation algorithm, made publicly available for clinical and research use. We employ publicly available training datasets and leverage the state-of-the-art medical image segmentation framework nnU-Net. Validation is conducted using both proprietary and public test datasets, with segmentation performance quantified by Dice coefficient and the 95th percentile Hausdorff distance. Furthermore, we analyze robustness across subgroups based on patient sex, age, CT contrast phases, and tumor histologic subtypes. Our findings demonstrate that our segmentation algorithm, trained exclusively on publicly available data, generalizes effectively to external test sets and outperforms existing state-of-the-art models across all tested datasets. Subgroup analyses reveal consistent high performance, indicating strong robustness and reliability. The developed algorithm and associated code are publicly accessible at https://github.com/DIAGNijmegen/oncology-kidney-abnormality-segmentation.

arxiv情報

著者	Sarah de Boer,Hartmut Häntze,Kiran Vaidhya Venkadesh,Myrthe A. D. Buser,Gabriel E. Humpire Mamani,Lina Xu,Lisa C. Adams,Jawed Nawabi,Keno K. Bressem,Bram van Ginneken,Mathias Prokop,Alessa Hering
発行日	2025-05-12 13:53:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Evaluating Modern Visual Anomaly Detection Approaches in Semiconductor Manufacturing: A Comparative Study

投稿日: 2025年5月13日作成者: jarxiv

要約

半導体製造は、複雑で多段階のプロセスです。
走査型電子顕微鏡（SEM）画像の自動目視検査は、機器のダウンタイムを最小限に抑えてコストを抑えるために不可欠です。
以前のほとんどの研究では、異常に標識されたサンプルの十分な数を仮定して、監視されたアプローチを考慮しています。
それどころか、新たな研究領域である視覚異常検出（VAD）は、監視されていない学習に焦点を当て、予測の説明を提供しながら、費用のかかる欠陥収集段階を回避します。
MIICデータセットを活用することにより、半導体ドメインにVADのベンチマークを導入します。
私たちの結果は、この分野での最新のVADアプローチの有効性を示しています。

要約(オリジナル)

Semiconductor manufacturing is a complex, multistage process. Automated visual inspection of Scanning Electron Microscope (SEM) images is indispensable for minimizing equipment downtime and containing costs. Most previous research considers supervised approaches, assuming a sufficient number of anomalously labeled samples. On the contrary, Visual Anomaly Detection (VAD), an emerging research domain, focuses on unsupervised learning, avoiding the costly defect collection phase while providing explanations of the predictions. We introduce a benchmark for VAD in the semiconductor domain by leveraging the MIIC dataset. Our results demonstrate the efficacy of modern VAD approaches in this field.

arxiv情報

著者	Manuel Barusco,Francesco Borsatti,Youssef Ben Khalifa,Davide Dalle Pezze,Gian Antonio Susto
発行日	2025-05-12 13:56:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Beyond Static Perception: Integrating Temporal Context into VLMs for Cloth Folding

投稿日: 2025年5月13日作成者: jarxiv

要約

衣服の操作は、複雑なダイナミクス、高い変形性、頻繁な自己閉鎖のために挑戦的です。
衣服は、ほぼ無限の数の構成を示し、明示的な状態表現を定義するのが困難になります。
このホワイトペーパーでは、視覚観測から言語で条件付けられたピックアンドプレイスアクションを予測するモデルであり、エンドツーエンドの学習を通じて衣服の状態を暗黙的にコードするモデルを分析します。
しわくちゃの衣服や失敗した操作からの回復などのシナリオに対処するために、Bifoldは時間的コンテキストを活用して状態の推定を改善します。
モデルの内部表現を調べ、その微調整と時間的コンテキストがテキスト領域と画像領域間の効果的なアライメント、および時間的一貫性を可能にするという証拠を提示します。

要約(オリジナル)

Manipulating clothes is challenging due to their complex dynamics, high deformability, and frequent self-occlusions. Garments exhibit a nearly infinite number of configurations, making explicit state representations difficult to define. In this paper, we analyze BiFold, a model that predicts language-conditioned pick-and-place actions from visual observations, while implicitly encoding garment state through end-to-end learning. To address scenarios such as crumpled garments or recovery from failed manipulations, BiFold leverages temporal context to improve state estimation. We examine the internal representations of the model and present evidence that its fine-tuning and temporal context enable effective alignment between text and image regions, as well as temporal consistency.

arxiv情報

著者	Oriol Barbany,Adrià Colomé,Carme Torras
発行日	2025-05-12 14:24:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Deep Learning Advances in Vision-Based Traffic Accident Anticipation: A Comprehensive Review of Methods,Datasets,and Future Directions

投稿日: 2025年5月13日作成者: jarxiv

要約

Traffic accident prediction and detection are critical for enhancing road safety,and vision-based traffic accident anticipation (Vision-TAA) has emerged as a promising approach in the era of deep learning.This paper reviews 147 recent studies,focusing on the application of supervised,unsupervised,and hybrid deep learning models for accident prediction,alongside the use of real-world and synthetic datasets.Current methodologies are categorized into four key approaches: image and video feature-based
予測、時空の特徴に基づいた予測、シーンの理解、およびマルチモーダルデータ融合。これらの方法は、データ不足、複雑なシナリオへの限定的な一般化、リアルタイムのパフォーマンス制約などの重要な潜在的潜在能力を示しています。
このレビューは、マルチモーダルデータ融合の統合、自己監視学習、および予測の精度とスケーラビリティを強化するための変圧器ベースのアーキテクチャなど、将来の研究の機会を強調しています。既存の進歩を統合し、重要なギャップを特定することにより、このペーパーでは、堅牢で適応的なビジョンTAAシステムを開発するための基本的な参照を提供し、道路安全と交通管理に貢献します。

要約(オリジナル)

Traffic accident prediction and detection are critical for enhancing road safety,and vision-based traffic accident anticipation (Vision-TAA) has emerged as a promising approach in the era of deep learning.This paper reviews 147 recent studies,focusing on the application of supervised,unsupervised,and hybrid deep learning models for accident prediction,alongside the use of real-world and synthetic datasets.Current methodologies are categorized into four key approaches: image and video feature-based prediction, spatiotemporal feature-based prediction, scene understanding,and multimodal data fusion.While these methods demonstrate significant potential,challenges such as data scarcity,limited generalization to complex scenarios,and real-time performance constraints remain prevalent. This review highlights opportunities for future research,including the integration of multimodal data fusion, self-supervised learning,and Transformer-based architectures to enhance prediction accuracy and scalability.By synthesizing existing advancements and identifying critical gaps, this paper provides a foundational reference for developing robust and adaptive Vision-TAA systems,contributing to road safety and traffic management.

arxiv情報

著者	Yi Zhang,Wenye Zhou,Ruonan Lin,Xin Yang,Hao Zheng
発行日	2025-05-12 14:34:22+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

Higher-Order Convolution Improves Neural Predictivity in the Retina

投稿日: 2025年5月13日作成者: jarxiv

要約

畳み込みニューラルネットワーク（CNNS）内に直接高次操作を組み込む神経応答予測に対する新しいアプローチを提示します。
私たちのモデルは、畳み込み演算子自体に高次操作を埋め込むことにより、従来の3D CNNを拡張し、空間と時間を超えて隣接するピクセル間の乗算的相互作用の直接モデリングを可能にします。
私たちのモデルは、CNNの深さを高めることなくCNNの表現力を高め、したがって、深い人工ネットワークと生物学的視覚系の比較的浅い処理階層との間の建築的格差に対処します。
2つの異なるデータセットでアプローチを評価します。自然シーンに対するサンショウウオ網膜神経節細胞（RGC）応答と、制御された幾何学的変換に対するマウスRGC応答の新しいデータセットです。
当社の高次CNN（HOCNN）は、標準的なアーキテクチャと比較してトレーニングデータの半分のみを必要としながら優れたパフォーマンスを実現し、神経応答と0.75の最大0.75の相関係数を示しています（0.80 $ \ PM $ 0.02網膜信頼性に対して）。
最先端のアーキテクチャに統合されると、私たちのアプローチは、さまざまな種や刺激条件のパフォーマンスを一貫して改善します。
学習した表現の分析により、当社のネットワークは自然に基本的な幾何学的変換、特にオブジェクトの拡張と収縮を特徴付けるスケーリングパラメーターを自然にエンコードすることが明らかになりました。
この機能は、迫り来るオブジェクトとオブジェクトの動きをそれぞれ検出することが知られている細胞上の一時的なオフアルファや過渡などの特定の細胞タイプに特に関連しています。
スケーリングパラメーターの相関係数は、ベースラインモデル（0.32）と比較して、HOCNN（0.72）の2倍以上の高さです。

要約(オリジナル)

We present a novel approach to neural response prediction that incorporates higher-order operations directly within convolutional neural networks (CNNs). Our model extends traditional 3D CNNs by embedding higher-order operations within the convolutional operator itself, enabling direct modeling of multiplicative interactions between neighboring pixels across space and time. Our model increases the representational power of CNNs without increasing their depth, therefore addressing the architectural disparity between deep artificial networks and the relatively shallow processing hierarchy of biological visual systems. We evaluate our approach on two distinct datasets: salamander retinal ganglion cell (RGC) responses to natural scenes, and a new dataset of mouse RGC responses to controlled geometric transformations. Our higher-order CNN (HoCNN) achieves superior performance while requiring only half the training data compared to standard architectures, demonstrating correlation coefficients up to 0.75 with neural responses (against 0.80$\pm$0.02 retinal reliability). When integrated into state-of-the-art architectures, our approach consistently improves performance across different species and stimulus conditions. Analysis of the learned representations reveals that our network naturally encodes fundamental geometric transformations, particularly scaling parameters that characterize object expansion and contraction. This capability is especially relevant for specific cell types, such as transient OFF-alpha and transient ON cells, which are known to detect looming objects and object motion respectively, and where our model shows marked improvement in response prediction. The correlation coefficients for scaling parameters are more than twice as high in HoCNN (0.72) compared to baseline models (0.32).

arxiv情報

著者	Simone Azeglio,Victor Calbiague Garcia,Guilhem Glaziou,Peter Neri,Olivier Marre,Ulisse Ferrari
発行日	2025-05-12 14:43:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.LG, q-bio.NC | コメントを受け付けていません

A Unified Hierarchical Framework for Fine-grained Cross-view Geo-localization over Large-scale Scenarios

投稿日: 2025年5月13日作成者: jarxiv

要約

クロスビュージオローカリゼーションは、大規模なローカリゼーションの問題に対する有望なソリューションであり、微細な予測を実現するために検索およびメトリックのローカリゼーションタスクの連続的な実行が必要です。
ただし、既存の方法は通常、これら2つのタスクのスタンドアロンモデルの設計に焦点を当てており、その結果、非効率的なコラボレーションとトレーニングオーバーヘッドの増加につながります。
このホワイトペーパーでは、検索とメトリックのローカリゼーションタスクを単一のネットワークに統合する新しい統一された階層ジオローカリゼーションフレームワークであるUnifyGeoを提案します。
具体的には、まず共有パラメーターを使用して統一された学習戦略を採用して、マルチ粒度表現を共同で学習し、これら2つのタスク間の相互強化を促進します。
その後、専用の損失関数に導かれる再ランクメカニズムを設計します。これにより、検索精度とメトリックローカリゼーション参照の両方を改善することにより、地理ローカリゼーションのパフォーマンスが向上します。
広範な実験では、UnifyGeoがタスクに関連する設定とタスク関連設定の両方で最先端のアートを大幅に上回ることが示されています。
驚くべきことに、細粒のローカリゼーション評価をサポートする挑戦的な活力ベンチマークでは、1メートルレベルのローカリゼーションリコール率は、それぞれ同次およびクロスエリアの評価でそれぞれ1.53 \％から39.64 \％、0.43 \％から25.58 \％に改善されます。
コードは公開されます。

要約(オリジナル)

Cross-view geo-localization is a promising solution for large-scale localization problems, requiring the sequential execution of retrieval and metric localization tasks to achieve fine-grained predictions. However, existing methods typically focus on designing standalone models for these two tasks, resulting in inefficient collaboration and increased training overhead. In this paper, we propose UnifyGeo, a novel unified hierarchical geo-localization framework that integrates retrieval and metric localization tasks into a single network. Specifically, we first employ a unified learning strategy with shared parameters to jointly learn multi-granularity representation, facilitating mutual reinforcement between these two tasks. Subsequently, we design a re-ranking mechanism guided by a dedicated loss function, which enhances geo-localization performance by improving both retrieval accuracy and metric localization references. Extensive experiments demonstrate that UnifyGeo significantly outperforms the state-of-the-arts in both task-isolated and task-associated settings. Remarkably, on the challenging VIGOR benchmark, which supports fine-grained localization evaluation, the 1-meter-level localization recall rate improves from 1.53\% to 39.64\% and from 0.43\% to 25.58\% under same-area and cross-area evaluations, respectively. Code will be made publicly available.

arxiv情報

著者	Zhuo Song,Ye Zhang,Kunhong Li,Longguang Wang,Yulan Guo
発行日	2025-05-12 14:44:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

Neural Brain: A Neuroscience-inspired Framework for Embodied Agents

投稿日: 2025年5月13日作成者: jarxiv

要約

人工知能（AI）の急速な進化は、静的なデータ駆動型モデルから、実際の環境を知覚し、相互作用できる動的システムに移行しました。
パターン認識と象徴的な推論の進歩にもかかわらず、大規模な言語モデルなどの現在のAIシステムは、身体化されておらず、世界と物理的に関与することができません。
この制限により、ヒューマノイドロボットなどの自律剤が非構造化された環境をナビゲートして操作して人間のような適応性を操作する必要がある具体化されたAIの台頭が促進されました。
この課題の核心には、人間のような適応性を備えた具体化された薬剤を駆動するように設計された中央インテリジェンスシステムである神経脳の概念があります。
神経脳は、マルチモーダルセンシングと知覚を認知能力とシームレスに統合する必要があります。
これを達成するには、適応型メモリシステムとエネルギー効率の高いハードウェアソフトウェアの共同設計も必要であり、動的環境でのリアルタイムアクションを可能にします。
このホワイトペーパーでは、具体化された薬剤の神経脳の統一されたフレームワークを紹介し、2つの基本的な課題に対処します。（1）神経脳のコア成分を定義し、（2）静的AIモデルと実際の展開に必要な動的適応性との間のギャップを埋める。
この目的のために、マルチモーダルアクティブセンシング、知覚認知アクション機能、神経可塑性ベースのメモリストレージと更新、および神経型ハードウェア/ソフトウェアの最適化を統合する生物学的にインスパイアされたアーキテクチャを提案します。
さらに、これらの4つの側面にわたる具体化されたエージェントに関する最新の研究をレビューし、現在のAIシステムと人間の知能のギャップを分析します。
神経科学からの洞察を統合することにより、現実世界のシナリオで人間レベルの知性が可能な一般化可能な自律剤の開発に向けたロードマップの概要を説明します。

要約(オリジナル)

The rapid evolution of artificial intelligence (AI) has shifted from static, data-driven models to dynamic systems capable of perceiving and interacting with real-world environments. Despite advancements in pattern recognition and symbolic reasoning, current AI systems, such as large language models, remain disembodied, unable to physically engage with the world. This limitation has driven the rise of embodied AI, where autonomous agents, such as humanoid robots, must navigate and manipulate unstructured environments with human-like adaptability. At the core of this challenge lies the concept of Neural Brain, a central intelligence system designed to drive embodied agents with human-like adaptability. A Neural Brain must seamlessly integrate multimodal sensing and perception with cognitive capabilities. Achieving this also requires an adaptive memory system and energy-efficient hardware-software co-design, enabling real-time action in dynamic environments. This paper introduces a unified framework for the Neural Brain of embodied agents, addressing two fundamental challenges: (1) defining the core components of Neural Brain and (2) bridging the gap between static AI models and the dynamic adaptability required for real-world deployment. To this end, we propose a biologically inspired architecture that integrates multimodal active sensing, perception-cognition-action function, neuroplasticity-based memory storage and updating, and neuromorphic hardware/software optimization. Furthermore, we also review the latest research on embodied agents across these four aspects and analyze the gap between current AI systems and human intelligence. By synthesizing insights from neuroscience, we outline a roadmap towards the development of generalizable, autonomous agents capable of human-level intelligence in real-world scenarios.

arxiv情報

著者	Jian Liu,Xiongtao Shi,Thai Duy Nguyen,Haitian Zhang,Tianxiang Zhang,Wei Sun,Yanjie Li,Athanasios V. Vasilakos,Giovanni Iacca,Arshad Ali Khan,Arvind Kumar,Jae Won Cho,Ajmal Mian,Lihua Xie,Erik Cambria,Lin Wang
発行日	2025-05-12 15:05:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CV, cs.RO | コメントを受け付けていません

ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models

投稿日: 2025年5月13日作成者: jarxiv

要約

現在の拡散ベースのテキストからビデオへのメソッドは、単一ショットの短いビデオクリップを作成することに限定されており、同じ文字が同じまたは異なる背景で異なるアクティビティを実行する個別の遷移でマルチショットビデオを生成する機能がありません。
この制限に対処するために、データセットコレクションパイプラインとアーキテクチャ拡張機能をビデオ拡散モデルに含むフレームワークを提案して、テキストからマルチショットのビデオ生成を可能にします。
当社のアプローチにより、すべてのショットのすべてのフレームにわたって完全に注意を払った単一のビデオとしてマルチショットビデオの生成が可能になり、キャラクターとバックグラウンドの一貫性が確保され、ユーザーがショット固有の条件付けを通じてショットの数、期間、コンテンツを制御できます。
これは、遷移トークンをテキスト間モデルに組み込み、新しいショットが始まるフレームを制御し、トランジショントークンの効果を制御し、ショット固有のプロンプトを可能にするローカルな注意マスキング戦略を制御することで達成されます。
トレーニングデータを取得するために、既存のシングルショットビデオデータセットからマルチショットビデオデータセットを構築するための新しいデータ収集パイプラインを提案します。
広範な実験は、数千の反復の事前に訓練されたテキストからビデオへのモデルを微調整するだけで、モデルがショット固有のコントロールを備えたマルチショットビデオを生成し、ベースラインを上回ることができることを示しています。
詳細については、https：//shotadapter.github.io/をご覧ください。

要約(オリジナル)

Current diffusion-based text-to-video methods are limited to producing short video clips of a single shot and lack the capability to generate multi-shot videos with discrete transitions where the same character performs distinct activities across the same or different backgrounds. To address this limitation we propose a framework that includes a dataset collection pipeline and architectural extensions to video diffusion models to enable text-to-multi-shot video generation. Our approach enables generation of multi-shot videos as a single video with full attention across all frames of all shots, ensuring character and background consistency, and allows users to control the number, duration, and content of shots through shot-specific conditioning. This is achieved by incorporating a transition token into the text-to-video model to control at which frames a new shot begins and a local attention masking strategy which controls the transition token’s effect and allows shot-specific prompting. To obtain training data we propose a novel data collection pipeline to construct a multi-shot video dataset from existing single-shot video datasets. Extensive experiments demonstrate that fine-tuning a pre-trained text-to-video model for a few thousand iterations is enough for the model to subsequently be able to generate multi-shot videos with shot-specific control, outperforming the baselines. You can find more details in https://shotadapter.github.io/

arxiv情報

著者	Ozgur Kara,Krishna Kumar Singh,Feng Liu,Duygu Ceylan,James M. Rehg,Tobias Hinz
発行日	2025-05-12 15:22:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

Breast Cancer Classification in Deep Ultraviolet Fluorescence Images Using a Patch-Level Vision Transformer Framework

投稿日: 2025年5月13日作成者: jarxiv

要約

乳房を消費する手術（BCS）は、健康な組織保存を最大化しながら、悪性病変を完全に除去することを目指しています。
術中のマージン評価は、徹底的な癌切除と組織保存のバランスをとるために不可欠です。
深い紫外線蛍光スキャン顕微鏡（DUV-FSM）により、切除された組織の表面画像全体（WSI）の迅速な獲得が可能になり、悪性組織と正常組織のコントラストが提供されます。
しかし、DUV WSISによる乳がん分類は、高解像度と複雑な組織病理学的特徴によって挑戦されています。
この研究では、パッチレベルのビジョントランス（VIT）モデルを使用して、ローカルおよびグローバルな機能をキャプチャして、DUV WSI分類フレームワークを紹介します。
Grad-CAM ++顕著な重み付けは、関連する空間領域を強調し、結果の解釈可能性を高め、良性および悪性組織分類の診断精度を向上させます。
包括的な5倍の交差検証は、提案されたアプローチが従来の深い学習方法を大幅に上回ることを示しており、98.33％の分類精度を達成しています。

要約(オリジナル)

Breast-conserving surgery (BCS) aims to completely remove malignant lesions while maximizing healthy tissue preservation. Intraoperative margin assessment is essential to achieve a balance between thorough cancer resection and tissue conservation. A deep ultraviolet fluorescence scanning microscope (DUV-FSM) enables rapid acquisition of whole surface images (WSIs) for excised tissue, providing contrast between malignant and normal tissues. However, breast cancer classification with DUV WSIs is challenged by high resolutions and complex histopathological features. This study introduces a DUV WSI classification framework using a patch-level vision transformer (ViT) model, capturing local and global features. Grad-CAM++ saliency weighting highlights relevant spatial regions, enhances result interpretability, and improves diagnostic accuracy for benign and malignant tissue classification. A comprehensive 5-fold cross-validation demonstrates the proposed approach significantly outperforms conventional deep learning methods, achieving a classification accuracy of 98.33%.

arxiv情報

著者	Pouya Afshin,David Helminiak,Tongtong Lu,Tina Yen,Julie M. Jorns,Mollie Patton,Bing Yu,Dong Hye Ye
発行日	2025-05-12 15:22:54+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, eess.IV | コメントを受け付けていません

Introducing Unbiased Depth into 2D Gaussian Splatting for High-accuracy Surface Reconstruction

投稿日: 2025年5月13日作成者: jarxiv

要約

最近、2D Gaussian Splatting（2DGS）は、2Dサーフェルを使用して薄い表面を近似することにより、一般的な3DGよりも優れた幾何学の再構成品質を実証しました。
ただし、光沢のある表面を扱うと不足しているため、これらの領域に目に見える穴があります。
反射の不連続性が問題を引き起こすことがわかりました。
拡散から鏡面反射へのジャンプをさまざまな視聴角で適合させるために、最適化されたガウスプリミティブに深さバイアスが導入されます。
それに対処するために、最初に2DGSの深度歪み損失を新しい深度収束損失に置き換えます。これは、深さの連続性に強い制約を課します。
次に、実際の表面を決定する際の深さの基準を修正しました。これは、光線に沿って交差するすべてのガウス人を完全に説明します。
さまざまなデータセットにわたる定性的および定量的評価は、2DGよりも完全で正確な表面で、この方法が再構成の品質を大幅に向上させることを明らかにしています。

要約(オリジナル)

Recently, 2D Gaussian Splatting (2DGS) has demonstrated superior geometry reconstruction quality than the popular 3DGS by using 2D surfels to approximate thin surfaces. However, it falls short when dealing with glossy surfaces, resulting in visible holes in these areas. We found the reflection discontinuity causes the issue. To fit the jump from diffuse to specular reflection at different viewing angles, depth bias is introduced in the optimized Gaussian primitives. To address that, we first replace the depth distortion loss in 2DGS with a novel depth convergence loss, which imposes a strong constraint on depth continuity. Then, we rectified the depth criterion in determining the actual surface, which fully accounts for all the intersecting Gaussians along the ray. Qualitative and quantitative evaluations across various datasets reveal that our method significantly improves reconstruction quality, with more complete and accurate surfaces than 2DGS.

arxiv情報

著者	Xiaoming Peng,Yixin Yang,Yang Zhou,Hui Huang
発行日	2025-05-12 15:28:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント