jarxiv | Japanese arxiv | ページ 1541

Diagnosing COVID-19 Severity from Chest X-Ray Images Using ViT and CNN Architectures

投稿日: 2025年3月3日作成者: jarxiv

要約

Covid-19のパンデミックは、緊張した医療リソースを緊張させ、機械学習がどのように医師の負担を軽減し、診断に貢献できるかについての議論を促しました。
胸部X線（CXR）はCOVID-19の診断に使用されますが、CXRからの患者の状態の重症度を予測する研究はほとんどありません。
この研究では、3つのソースを融合させ、重大度回帰と分類タスクの両方でImagENETおよびCXR定められたモデルと視覚変圧器（VIT）を使用した転送学習の有効性を調査することにより、大きな共ビッド重症度データセットを生成します。
前処理されたDensenet161モデルは、3つのクラスの重大度予測問題で最高のパフォーマンスを発揮し、全体で80％の精度、77.3％、83.9％、および70％が軽度、中程度、重度の症例でそれぞれに達しました。
VITには、放射線科医予測の重症度スコアと比較して、平均絶対誤差が0.5676の最高の回帰結果がありました。
プロジェクトのソースコードは公開されています。

要約(オリジナル)

The COVID-19 pandemic strained healthcare resources and prompted discussion about how machine learning can alleviate physician burdens and contribute to diagnosis. Chest x-rays (CXRs) are used for diagnosis of COVID-19, but few studies predict the severity of a patient’s condition from CXRs. In this study, we produce a large COVID severity dataset by merging three sources and investigate the efficacy of transfer learning using ImageNet- and CXR-pretrained models and vision transformers (ViTs) in both severity regression and classification tasks. A pretrained DenseNet161 model performed the best on the three class severity prediction problem, reaching 80% accuracy overall and 77.3%, 83.9%, and 70% on mild, moderate and severe cases, respectively. The ViT had the best regression results, with a mean absolute error of 0.5676 compared to radiologist-predicted severity scores. The project’s source code is publicly available.

arxiv情報

著者	Luis Lara,Lucia Eve Berger,Rajesh Raju
発行日	2025-02-28 14:34:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, eess.IV | コメントを受け付けていません

Adaptive Accelerated Proximal Gradient Methods with Variance Reduction for Composite Nonconvex Finite-Sum Minimization

投稿日: 2025年3月3日作成者: jarxiv

要約

このホワイトペーパーでは、{\ sf aapg-Spider}を提案します。{\ sf aapg-spider}は、複合convex有限サム関数を最小化するための分散削減を備えた適応加速近位勾配（AAPG）メソッドを提案します。
3つの加速技術を統合します。適応段階、ネステロフの外挿、再帰確率的パス統合推定量クモです。
確率的な有限サムの問題をターゲットにしている間、{\ sf aapg-spider}は、完全なバッチで非確率的設定で{\ sf aapg}に簡素化されますが、これも独立した関心です。
私たちの知る限り、{\ sf aapg-spider}および{\ sf aapg}は、このクラスの\ textit {composite}最小化問題の最適な反復の複雑さを実現する最初の学習レートフリーの方法です。
具体的には、{\ sf aapg}は、$ \ mathcal {o}（n \ epsilon^{-2}）$の最適な反復複雑度を達成し、{\ sf aapg-spider}は$ \ mathcal {o}（n + \ sqrt {n} \ epsilon^{-2} {n + \ sqrt（n + \ sqrt）を達成します。
$ \ epsilon $ -Approximateの固定点。$ n $はコンポーネント関数の数です。
Kurdyka-Lojasiewicz（KL）の仮定の下で、両方の方法で非エルゴード収束率を確立します。
スパースフェーズ検索および線形固有値の問題に関する予備的な実験は、既存の方法と比較して{\ sf aapg-Spider}および{\ sf aapg}の優れた性能を示しています。

要約(オリジナル)

This paper proposes {\sf AAPG-SPIDER}, an Adaptive Accelerated Proximal Gradient (AAPG) method with variance reduction for minimizing composite nonconvex finite-sum functions. It integrates three acceleration techniques: adaptive stepsizes, Nesterov’s extrapolation, and the recursive stochastic path-integrated estimator SPIDER. While targeting stochastic finite-sum problems, {\sf AAPG-SPIDER} simplifies to {\sf AAPG} in the full-batch, non-stochastic setting, which is also of independent interest. To our knowledge, {\sf AAPG-SPIDER} and {\sf AAPG} are the first learning-rate-free methods to achieve optimal iteration complexity for this class of \textit{composite} minimization problems. Specifically, {\sf AAPG} achieves the optimal iteration complexity of $\mathcal{O}(N \epsilon^{-2})$, while {\sf AAPG-SPIDER} achieves $\mathcal{O}(N + \sqrt{N} \epsilon^{-2})$ for finding $\epsilon$-approximate stationary points, where $N$ is the number of component functions. Under the Kurdyka-Lojasiewicz (KL) assumption, we establish non-ergodic convergence rates for both methods. Preliminary experiments on sparse phase retrieval and linear eigenvalue problems demonstrate the superior performance of {\sf AAPG-SPIDER} and {\sf AAPG} compared to existing methods.

arxiv情報

著者	Ganzhao Yuan
発行日	2025-02-28 14:37:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.NA, math.NA, math.OC | コメントを受け付けていません

LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control

投稿日: 2025年3月3日作成者: jarxiv

要約

ポートレートアニメーションは、単一のソース画像からのリアルなビデオを合成することを目的としており、それを外観リファレンスとして使用し、運転ビデオ、オーディオ、テキスト、または世代から派生した動き（つまり、表情とヘッドポーズ）を使用します。
主流の拡散ベースの方法に従う代わりに、計算効率と制御可能性のバランスをとる暗黙のキーポンポイントベースのフレームワークの可能性を調査および拡張します。
これに基づいて、実用的な使用のためのより良い一般化、制御可能性、効率性に焦点を当てたLiveportraitという名前のビデオ駆動型のポートレートアニメーションフレームワークを開発します。
生成品質と一般化能力を高めるために、トレーニングデータを約6900万の高品質フレームに拡大し、混合画像トレーニング戦略を採用し、ネットワークアーキテクチャをアップグレードし、より良いモーション変換と最適化の目的を設計します。
さらに、コンパクトな暗黙のキーポイントが一種のブレンドシェイプを効果的に表現し、細心の注意を払って2つのリターゲティングモジュールを提案することがわかります。
実験結果は、拡散ベースの方法と比較しても、フレームワークの有効性を示しています。
Pytorchを使用したRTX 4090 GPUで、発電速度は著しく12.8msに達します。
推論コードとモデルは、https：//github.com/kwaivgi/liveportraitで入手できます

要約(オリジナル)

Portrait Animation aims to synthesize a lifelike video from a single source image, using it as an appearance reference, with motion (i.e., facial expressions and head pose) derived from a driving video, audio, text, or generation. Instead of following mainstream diffusion-based methods, we explore and extend the potential of the implicit-keypoint-based framework, which effectively balances computational efficiency and controllability. Building upon this, we develop a video-driven portrait animation framework named LivePortrait with a focus on better generalization, controllability, and efficiency for practical usage. To enhance the generation quality and generalization ability, we scale up the training data to about 69 million high-quality frames, adopt a mixed image-video training strategy, upgrade the network architecture, and design better motion transformation and optimization objectives. Additionally, we discover that compact implicit keypoints can effectively represent a kind of blendshapes and meticulously propose a stitching and two retargeting modules, which utilize a small MLP with negligible computational overhead, to enhance the controllability. Experimental results demonstrate the efficacy of our framework even compared to diffusion-based methods. The generation speed remarkably reaches 12.8ms on an RTX 4090 GPU with PyTorch. The inference code and models are available at https://github.com/KwaiVGI/LivePortrait

arxiv情報

著者	Jianzhu Guo,Dingyun Zhang,Xiaoqiang Liu,Zhizhou Zhong,Yuan Zhang,Pengfei Wan,Di Zhang
発行日	2025-02-28 14:39:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

Intuitive Surgical SurgToolLoc Challenge Results: 2022-2023

投稿日: 2025年3月3日作成者: jarxiv

要約

ロボット支援（RA）手術は、外科的介入を変革することを約束します。
直感的な外科手術は、これらの変更と、それらを有効にする機械学習モデルとアルゴリズムを促進することに取り組んでいます。
これらの目標を念頭に置いて、外科データサイエンスコミュニティに、医療イメージングコンピューティングとコンピューター支援介入（MICCAI）会議を通じて開催される毎年の競争に参加するよう招待しました。
毎年さまざまな変更により、私たちはコミュニティに、高度なRAアプリケーションのコンテキストで困難な機械学習の問題を解決するように挑戦してきました。
ここでは、これらの課題の結果を文書化し、外科的ツールの局在化（SurgToolloc）に焦点を当てています。
これらの課題に伴う公開されたデータセットは、別の論文Arxiv：2501.09209 [1]に詳述されています。

要約(オリジナル)

Robotic assisted (RA) surgery promises to transform surgical intervention. Intuitive Surgical is committed to fostering these changes and the machine learning models and algorithms that will enable them. With these goals in mind we have invited the surgical data science community to participate in a yearly competition hosted through the Medical Imaging Computing and Computer Assisted Interventions (MICCAI) conference. With varying changes from year to year, we have challenged the community to solve difficult machine learning problems in the context of advanced RA applications. Here we document the results of these challenges, focusing on surgical tool localization (SurgToolLoc). The publicly released dataset that accompanies these challenges is detailed in a separate paper arXiv:2501.09209 [1].

arxiv情報

著者	Aneeq Zia,Max Berniker,Rogerio Garcia Nespolo,Conor Perreault,Kiran Bhattacharyya,Xi Liu,Ziheng Wang,Satoshi Kondo,Satoshi Kasai,Kousuke Hirasawa,Bo Liu,David Austin,Yiheng Wang,Michal Futrega,Jean-Francois Puget,Zhenqiang Li,Yoichi Sato,Ryo Fujii,Ryo Hachiuma,Mana Masuda,Hideo Saito,An Wang,Mengya Xu,Mobarakol Islam,Long Bai,Winnie Pang,Hongliang Ren,Chinedu Nwoye,Luca Sestini,Nicolas Padoy,Maximilian Nielsen,Samuel Schüttler,Thilo Sentker,Hümeyra Husseini,Ivo Baltruschat,Rüdiger Schmitz,René Werner,Aleksandr Matsun,Mugariya Farooq,Numan Saaed,Jose Renato Restom Viera,Mohammad Yaqub,Neil Getty,Fangfang Xia,Zixuan Zhao,Xiaotian Duan,Xing Yao,Ange Lou,Hao Yang,Jintong Han,Jack Noble,Jie Ying Wu,Tamer Abdulbaki Alshirbaji,Nour Aldeen Jalal,Herag Arabian,Ning Ding,Knut Moeller,Weiliang Chen,Quan He,Muhammad Bilal,Taofeek Akinosho,Adnan Qayyum,Massimo Caputo,Hunaid Vohra,Michael Loizou,Anuoluwapo Ajayi,Ilhem Berrou,Faatihah Niyi-Odumosu,Charlie Budd,Oluwatosin Alabi,Tom Vercauteren,Ruoxi Zhao,Ayberk Acar,John Han,Jumanh Atoum,Yinhong Qin,Jie Ying Wu,Surong Hua,Lu Ping,Wenming Wu,Rongfeng Wei,Jinlin Wu,You Pang,Zhen Chen,Tim Jaspers,Amine Yamlahi,Piotr Kalinowski,Dominik Michael,Tim Rä dsch,Marco Hübner,Danail Stoyanov,Stefanie Speidel,Lena Maier-Hein,Anthony Jarc
発行日	2025-02-28 14:42:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

A Non-contrast Head CT Foundation Model for Comprehensive Neuro-Trauma Triage

投稿日: 2025年3月3日作成者: jarxiv

要約

AIおよび医療イメージングの最近の進歩は、緊急時のヘッドCT解釈における変革の可能性を提供し、評価時間を短縮し、そのようなスキャンの要求の増加と放射線科医の世界的不足に直面して精度を向上させます。
この研究では、多様な神経外傷所見を高精度と効率で検出するための3Dファンデーションモデルを紹介します。
自動ラベル付けのために大規模な言語モデル（LLM）を使用して、臨界条件のために包括的なマルチラベル注釈を生成しました。
私たちのアプローチには、出血サブタイプのセグメンテーションと脳の解剖学的構造のためのニューラルネットワークの事前供給が含まれていました。これは、マルチモーダル微調整を通じて、前提条件の包括的な神経外傷検出ネットワークに統合されました。
専門家の注釈に対するパフォーマンス評価とCT-CLIPとの比較により、出血や正中線のシフトなどの主要な神経外傷所見にわたる強いトリアージ精度、および脳浮腫や動脈高度などの頻度の低い重大条件が示されました。
神経固有の特徴の統合により、診断機能が大幅に強化され、16の神経外傷条件で平均AUCが0.861を達成しました。
この作業は、医療イメージングの基礎モデルを促進し、緊急放射線科における将来のAIアシスト神経外傷診断のベンチマークとして機能します。

要約(オリジナル)

Recent advancements in AI and medical imaging offer transformative potential in emergency head CT interpretation for reducing assessment times and improving accuracy in the face of an increasing request of such scans and a global shortage in radiologists. This study introduces a 3D foundation model for detecting diverse neuro-trauma findings with high accuracy and efficiency. Using large language models (LLMs) for automatic labeling, we generated comprehensive multi-label annotations for critical conditions. Our approach involved pretraining neural networks for hemorrhage subtype segmentation and brain anatomy parcellation, which were integrated into a pretrained comprehensive neuro-trauma detection network through multimodal fine-tuning. Performance evaluation against expert annotations and comparison with CT-CLIP demonstrated strong triage accuracy across major neuro-trauma findings, such as hemorrhage and midline shift, as well as less frequent critical conditions such as cerebral edema and arterial hyperdensity. The integration of neuro-specific features significantly enhanced diagnostic capabilities, achieving an average AUC of 0.861 for 16 neuro-trauma conditions. This work advances foundation models in medical imaging, serving as a benchmark for future AI-assisted neuro-trauma diagnostics in emergency radiology.

arxiv情報

著者	Youngjin Yoo,Bogdan Georgescu,Yanbo Zhang,Sasa Grbic,Han Liu,Gabriela D. Aldea,Thomas J. Re,Jyotipriya Das,Poikavila Ullaskrishnan,Eva Eibenberger,Andrei Chekkoury,Uttam K. Bodanapally,Savvas Nicolaou,Pina C. Sanelli,Thomas J. Schroeppel,Yvonne W. Lui,Eli Gibson
発行日	2025-02-28 14:44:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, eess.IV | コメントを受け付けていません

‘No negatives needed’: weakly-supervised regression for interpretable tumor detection in whole-slide histopathology images

投稿日: 2025年3月3日作成者: jarxiv

要約

デジタル病理学における正確な腫瘍検出全スライド画像（WSI）は、がんの診断と治療計画に不可欠です。
複数のインスタンス学習（MIL）は、手動注釈を必要とせずに、大規模なデータを使用して、弱く監視された腫瘍検出のために広く使用されているアプローチとして浮上しています。
ただし、従来のMIL方法は、多くの場合、腫瘍を含まない症例を否定的な例として必要とする分類タスクに依存します。これは、特に外科的切除標本について、実際の臨床ワークフローで得るのが困難です。
腫瘍検出を回帰タスクとして再定式化することにより、この制限に対処し、複数の癌タイプで臨床的に利用可能な標的であるWSIから腫瘍の割合を推定します。
この論文では、複数の臓器、標本タイプ、臨床シナリオに適用することにより、提案されている弱体化された回帰フレームワークの分析を提供します。
私たちは、騒々しい退行ターゲットとしての腫瘍の割合に対するフレームワークの堅牢性を特徴づけ、小さな腫瘍領域から学習するときに腫瘍検出感度を改善するための増幅技術の新しい概念を導入します。
最後に、視覚的な注意とロジットマップを分析することにより、モデルの予測に関する解釈可能な洞察を提供します。
当社のコードは、https：//github.com/diagnijmegen/tumor-percentage-mil-regressionで入手できます。

要約(オリジナル)

Accurate tumor detection in digital pathology whole-slide images (WSIs) is crucial for cancer diagnosis and treatment planning. Multiple Instance Learning (MIL) has emerged as a widely used approach for weakly-supervised tumor detection with large-scale data without the need for manual annotations. However, traditional MIL methods often depend on classification tasks that require tumor-free cases as negative examples, which are challenging to obtain in real-world clinical workflows, especially for surgical resection specimens. We address this limitation by reformulating tumor detection as a regression task, estimating tumor percentages from WSIs, a clinically available target across multiple cancer types. In this paper, we provide an analysis of the proposed weakly-supervised regression framework by applying it to multiple organs, specimen types and clinical scenarios. We characterize the robustness of our framework to tumor percentage as a noisy regression target, and introduce a novel concept of amplification technique to improve tumor detection sensitivity when learning from small tumor regions. Finally, we provide interpretable insights into the model’s predictions by analyzing visual attention and logit maps. Our code is available at https://github.com/DIAGNijmegen/tumor-percentage-mil-regression.

arxiv情報

著者	Marina D’Amato,Jeroen van der Laak,Francesco Ciompi
発行日	2025-02-28 14:47:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, eess.IV | コメントを受け付けていません

The 3D-PC: a benchmark for visual perspective taking in humans and machines

投稿日: 2025年3月3日作成者: jarxiv

要約

視覚的視点（VPT）は、他者の視点について知覚する能力と推論です。
これは、人生の最初の10年にわたって発展し、視覚シーンの3D構造を処理する能力を必要とする人間の知性の重要な特徴です。
ますます多くのレポートにより、深いニューラルネットワーク（DNN）が大規模な画像データセットでトレーニングした後、3Dシーンを分析できることが示されています。
DNNSでの3D分析のこの緊急能力が、3D認識チャレンジ（3D-PC）を使用してVPTに十分であるかどうかを調査しました。これは、人間とDNNSの3D認識のための新しいベンチマークです。
3D-PCは、自然なシーン画像内にポーズをとった3つの3D分析タスクで構成されています。1。オブジェクト深度順序の簡単なテスト、2。基本的なVPTタスク（VPT-basic）、および3。別のバージョンの「ショートカット」視覚戦略の有効性を制限するように設計されたVPT（VPT-Strategy）。
人間の参加者（n = 33）をテストし（n = 33）、課題について300を超えるDNNSを線形にプローブまたはテキストプロンプト化し、ほとんどすべてのDNNがオブジェクトの深さ順序を分析する際に人間の精度に近づくか、それを超えたことを発見しました。
驚くべきことに、このタスクのDNN精度は、オブジェクト認識パフォーマンスと相関していました。
対照的に、VPT-BasicでDNNと人間の間に並外れたギャップがありました。
人間はほぼ完璧でしたが、ほとんどのDNNはほぼ偶然でした。
VPT-BasicでDNNを微調整すると、彼らは人間のパフォーマンスに近づきましたが、彼らは、人間とは異なり、VPT戦略でテストされたときに偶然に戻りました。
私たちの課題は、今日のDNNのトレーニングルーチンとアーキテクチャは、シーンやオブジェクトの基本的な3Dプロパティを学習するのに適しているが、人間と同様にこれらの特性について推論するのに適していないことを示しています。
3D-PCデータセットとコードをリリースして、人間と機械の間の3D認識でこのギャップを埋めるのに役立ちます。

要約(オリジナル)

Visual perspective taking (VPT) is the ability to perceive and reason about the perspectives of others. It is an essential feature of human intelligence, which develops over the first decade of life and requires an ability to process the 3D structure of visual scenes. A growing number of reports have indicated that deep neural networks (DNNs) become capable of analyzing 3D scenes after training on large image datasets. We investigated if this emergent ability for 3D analysis in DNNs is sufficient for VPT with the 3D perception challenge (3D-PC): a novel benchmark for 3D perception in humans and DNNs. The 3D-PC is comprised of three 3D-analysis tasks posed within natural scene images: 1. a simple test of object depth order, 2. a basic VPT task (VPT-basic), and 3. another version of VPT (VPT-Strategy) designed to limit the effectiveness of ‘shortcut’ visual strategies. We tested human participants (N=33) and linearly probed or text-prompted over 300 DNNs on the challenge and found that nearly all of the DNNs approached or exceeded human accuracy in analyzing object depth order. Surprisingly, DNN accuracy on this task correlated with their object recognition performance. In contrast, there was an extraordinary gap between DNNs and humans on VPT-basic. Humans were nearly perfect, whereas most DNNs were near chance. Fine-tuning DNNs on VPT-basic brought them close to human performance, but they, unlike humans, dropped back to chance when tested on VPT-Strategy. Our challenge demonstrates that the training routines and architectures of today’s DNNs are well-suited for learning basic 3D properties of scenes and objects but are ill-suited for reasoning about these properties as humans do. We release our 3D-PC datasets and code to help bridge this gap in 3D perception between humans and machines.

arxiv情報

著者	Drew Linsley,Peisen Zhou,Alekh Karkada Ashok,Akash Nagaraj,Gaurav Gaonkar,Francis E Lewis,Zygmunt Pizlo,Thomas Serre
発行日	2025-02-28 14:49:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.HC | コメントを受け付けていません

SEE: See Everything Every Time — Adaptive Brightness Adjustment for Broad Light Range Images via Events

投稿日: 2025年3月3日作成者: jarxiv

要約

高ダイナミックレンジが120db $を超えるイベントカメラは、従来の埋め込みカメラを大幅に上回り、低照度と高地の両方の状況を含むさまざまな照明条件下で詳細な変更情報を堅牢に記録します。
ただし、イベントデータの利用に関する最近の研究は、主に低光の画像強化に焦点を当てており、通常や高照明などの広範な照明条件にわたって画像の強化と輝度調整を無視しています。
これに基づいて、私たちは新しい研究の質問を提案します。広い照明条件下でキャプチャされた画像の明るさを強化し、適応的に調整するためにイベントを採用する方法は？
この質問を調査するために、最初に、610,126の画像と対応するイベントで構成される202のシナリオで構成される新しいデータセットSEE-600Kを収集しました。それぞれが照明に1000倍以上の変動を伴う4つの照明条件を備えています。
その後、イベントを効果的に利用して、プロンプトを使用して画像の明るさをスムーズに調整するフレームワークを提案します。
私たちのフレームワークは、センサーパターンを介して色をキャプチャし、クロスアテンションを使用してイベントを輝度辞書としてモデル化し、画像のダイナミックレンジを調整して、輝度プロンプトに基づいてピクセルレベルでデコードされる広い光範囲表現（BLR）を形成します。
実験結果は、私たちの方法が低照度エンハンスメントデータセットでうまく機能するだけでなく、SEE-600Kデータセットを使用してより広い光範囲画像強化の堅牢なパフォーマンスを示すことを示しています。
さらに、当社のアプローチにより、ピクセルレベルの輝度調整が可能になり、後処理に柔軟性を提供し、より多くのイメージングアプリケーションを刺激します。
データセットとソースコードは、https：//github.com/yunfanlu/seeで公開されています。

要約(オリジナル)

Event cameras, with a high dynamic range exceeding $120dB$, significantly outperform traditional embedded cameras, robustly recording detailed changing information under various lighting conditions, including both low- and high-light situations. However, recent research on utilizing event data has primarily focused on low-light image enhancement, neglecting image enhancement and brightness adjustment across a broader range of lighting conditions, such as normal or high illumination. Based on this, we propose a novel research question: how to employ events to enhance and adaptively adjust the brightness of images captured under broad lighting conditions? To investigate this question, we first collected a new dataset, SEE-600K, consisting of 610,126 images and corresponding events across 202 scenarios, each featuring an average of four lighting conditions with over a 1000-fold variation in illumination. Subsequently, we propose a framework that effectively utilizes events to smoothly adjust image brightness through the use of prompts. Our framework captures color through sensor patterns, uses cross-attention to model events as a brightness dictionary, and adjusts the image’s dynamic range to form a broad light-range representation (BLR), which is then decoded at the pixel level based on the brightness prompt. Experimental results demonstrate that our method not only performs well on the low-light enhancement dataset but also shows robust performance on broader light-range image enhancement using the SEE-600K dataset. Additionally, our approach enables pixel-level brightness adjustment, providing flexibility for post-processing and inspiring more imaging applications. The dataset and source code are publicly available at:https://github.com/yunfanLu/SEE.

arxiv情報

著者	Yunfan Lu,Xiaogang Xu,Hao Lu,Yanlin Qian,Pengteng Li,Huizai Yao,Bin Yang,Junyi Li,Qianyi Cai,Weiyu Guo,Hui Xiong
発行日	2025-02-28 14:55:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

Progressive Curriculum Learning with Scale-Enhanced U-Net for Continuous Airway Segmentation

投稿日: 2025年3月3日作成者: jarxiv

要約

胸部CT画像における気道の継続的かつ正確なセグメンテーションは、術前の計画とリアルタイム気管支鏡検査のナビゲーションに不可欠です。
医療画像セグメンテーションの深い学習の進歩にもかかわらず、特に大規模および小さな枝の間のクラス内の不均衡とぼやけたCTスキャンの詳細のために、気道の連続性を維持することは依然として課題です。
これらの課題に対処するために、セグメンテーションの継続性を高めるために、プログレッシブカリキュラム学習パイプラインとスケール強化U-Net（SE-UNET）を提案します。
具体的には、当社のプログレッシブカリキュラム学習パイプラインは、メインエアウェイの抽出、小さな気道の識別、不連続性の修復という3つの段階で構成されています。
各段階でのトリミングサンプリング戦略により、さまざまなスケールの気道間の特徴の干渉が減り、クラス内の不均衡の課題に効果的に対処します。
3番目のトレーニング段階では、適応性トポロジ応答性損失（ATRL）を提示して、気道の連続性に焦点を当てるようにネットワークを導きます。
プログレッシブトレーニングパイプラインは同じSE-UNETを共有し、マルチスケールの入力と詳細情報エンハンサー（DIE）を統合して、情報の流れを強化し、小さな気道の複雑な詳細を効果的にキャプチャします。
さらに、より臨床的に関連する正確な分析を提供するために、堅牢な気道ツリー解析方法と階層的評価メトリックを提案します。
社内データセットとパブリックデータセットの両方での実験は、この方法が既存のアプローチを上回り、小さな気道の精度と気道木の完全性を大幅に改善することを示しています。
コードは公開時にリリースされます。

要約(オリジナル)

Continuous and accurate segmentation of airways in chest CT images is essential for preoperative planning and real-time bronchoscopy navigation. Despite advances in deep learning for medical image segmentation, maintaining airway continuity remains a challenge, particularly due to intra-class imbalance between large and small branches and blurred CT scan details. To address these challenges, we propose a progressive curriculum learning pipeline and a Scale-Enhanced U-Net (SE-UNet) to enhance segmentation continuity. Specifically, our progressive curriculum learning pipeline consists of three stages: extracting main airways, identifying small airways, and repairing discontinuities. The cropping sampling strategy in each stage reduces feature interference between airways of different scales, effectively addressing the challenge of intra-class imbalance. In the third training stage, we present an Adaptive Topology-Responsive Loss (ATRL) to guide the network to focus on airway continuity. The progressive training pipeline shares the same SE-UNet, integrating multi-scale inputs and Detail Information Enhancers (DIEs) to enhance information flow and effectively capture the intricate details of small airways. Additionally, we propose a robust airway tree parsing method and hierarchical evaluation metrics to provide more clinically relevant and precise analysis. Experiments on both in-house and public datasets demonstrate that our method outperforms existing approaches, significantly improving the accuracy of small airways and the completeness of the airway tree. The code will be released upon publication.

arxiv情報

著者	Bingyu Yang,Qingyao Tian,Huai Liao,Xinyan Huang,Jinlin Wu,Jingdi Hu,Hongbin Liu
発行日	2025-02-28 15:04:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CV, eess.IV | コメントを受け付けていません

Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning

投稿日: 2025年3月3日作成者: jarxiv

要約

Multi-Instance Learning（MIL）は病理学的画像分類に成功しましたが、Gigapixel Whole Slide画像（WSI）からの多数のパッチを処理するため、高い推論コストの課題に直面しています。
これに対処するために、無関係なパッチを排除することにより迅速かつ正確な分類を実現する階層蒸留マルチインスタンス学習フレームワークであるHDMILを提案します。
HDMILは、動的マルチインスタンスネットワーク（DMIN）と軽量インスタンスプレスクリーニングネットワーク（LIPN）の2つの重要なコンポーネントで構成されています。
DMINは高解像度のWSIで動作し、LIPNは対応する低解像度の対応物で動作します。
トレーニング中、DMINは、無関係なパッチを示す注意スコアベースのマスクを生成しながら、WSI分類のためにトレーニングされます。
これらのマスクは、各低解像度パッチの関連性を予測するために、LIPNのトレーニングを導きます。
テスト中、LIPNは最初に低解像度WSI内の有用な領域を決定します。これにより、間接的に高解像度WSIの無関係な領域を排除することができ、パフォーマンスの悪化を引き起こすことなく推論時間を削減できます。
さらに、計算病理学における最初のChebyshev-PolynomialsベースのKolmogorov-Arnold分類器をさらに設計し、学習可能な活性化層を介してHDMILの性能を向上させます。
3つのパブリックデータセットでの広範な実験は、HDMILが以前の最先端の方法を上回ることを示しています。たとえば、AUCで3.13％の改善を達成しながら、Camelyon16データセットで推論時間を28.6％削減します。

要約(オリジナル)

Although multi-instance learning (MIL) has succeeded in pathological image classification, it faces the challenge of high inference costs due to processing numerous patches from gigapixel whole slide images (WSIs). To address this, we propose HDMIL, a hierarchical distillation multi-instance learning framework that achieves fast and accurate classification by eliminating irrelevant patches. HDMIL consists of two key components: the dynamic multi-instance network (DMIN) and the lightweight instance pre-screening network (LIPN). DMIN operates on high-resolution WSIs, while LIPN operates on the corresponding low-resolution counterparts. During training, DMIN are trained for WSI classification while generating attention-score-based masks that indicate irrelevant patches. These masks then guide the training of LIPN to predict the relevance of each low-resolution patch. During testing, LIPN first determines the useful regions within low-resolution WSIs, which indirectly enables us to eliminate irrelevant regions in high-resolution WSIs, thereby reducing inference time without causing performance degradation. In addition, we further design the first Chebyshev-polynomials-based Kolmogorov-Arnold classifier in computational pathology, which enhances the performance of HDMIL through learnable activation layers. Extensive experiments on three public datasets demonstrate that HDMIL outperforms previous state-of-the-art methods, e.g., achieving improvements of 3.13% in AUC while reducing inference time by 28.6% on the Camelyon16 dataset.

arxiv情報

著者	Jiuyang Dong,Junjun Jiang,Kui Jiang,Jiahan Li,Yongbing Zhang
発行日	2025-02-28 15:10:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント