jarxiv | Japanese arxiv | ページ 950

ProtoECGNet: Case-Based Interpretable Deep Learning for Multi-Label ECG Classification with Contrastive Learning

投稿日: 2025年4月14日作成者: jarxiv

要約

ディープラーニングベースの心電図（ECG）分類は印象的なパフォーマンスを示していますが、透明で忠実な説明の欠如により臨床的採用が遅くなっています。
顕著性マップなどの事後の方法は、モデルの真の決定プロセスを反映していない場合があります。
プロトタイプベースの推論は、実際のECGセグメントの学習された表現と類似した類似性で決定を下し、忠実なケースベースの説明を可能にすることにより、より透明な代替手段を提供します。
解釈可能なマルチラベルECG分類のためのプロトタイプベースのディープラーニングモデルであるProtoecGnetを紹介します。
ProtoecGnetは、臨床解釈ワークフローを反映する構造化されたマルチブランチアーキテクチャを採用しています。リズム分類のためのグローバルプロトタイプと1D CNN、形態ベースの推論のための時間局在化プロトタイプを持つ2D CNN、および拡散剤のグローバルプロトタイプを備えた2D CNNを統合します。
各ブランチは、マルチラベル学習用に設計されたプロトタイプ損失で訓練されており、クラスタリング、分離、多様性、および無関係なクラスのプロトタイプ間の適切な分離を促進しながら、頻繁に共起する診断のクラスタリングを可能にする新しい対照的な損失を組み合わせています。
PTB-XLデータセットからの71の診断ラベルすべてでProtoecGnetを評価し、構造化されたケースベースの説明を提供しながら、最先端のブラックボックスモデルと比較して競争力のあるパフォーマンスを実証します。
プロトタイプの品質を評価するために、最終モデルの予測プロトタイプの構造化された臨床医のレビューを実施し、それらが代表的かつ明確であると評価されていることがわかります。
ProtoecGnetは、プロトタイプ学習を複雑でマルチラベルの時系列分類に効果的に拡大できることを示しており、臨床的意思決定サポートのための透明で信頼できる深い学習モデルへの実用的な道を提供します。

要約(オリジナル)

Deep learning-based electrocardiogram (ECG) classification has shown impressive performance but clinical adoption has been slowed by the lack of transparent and faithful explanations. Post hoc methods such as saliency maps may fail to reflect a model’s true decision process. Prototype-based reasoning offers a more transparent alternative by grounding decisions in similarity to learned representations of real ECG segments, enabling faithful, case-based explanations. We introduce ProtoECGNet, a prototype-based deep learning model for interpretable, multi-label ECG classification. ProtoECGNet employs a structured, multi-branch architecture that reflects clinical interpretation workflows: it integrates a 1D CNN with global prototypes for rhythm classification, a 2D CNN with time-localized prototypes for morphology-based reasoning, and a 2D CNN with global prototypes for diffuse abnormalities. Each branch is trained with a prototype loss designed for multi-label learning, combining clustering, separation, diversity, and a novel contrastive loss that encourages appropriate separation between prototypes of unrelated classes while allowing clustering for frequently co-occurring diagnoses. We evaluate ProtoECGNet on all 71 diagnostic labels from the PTB-XL dataset, demonstrating competitive performance relative to state-of-the-art black-box models while providing structured, case-based explanations. To assess prototype quality, we conduct a structured clinician review of the final model’s projected prototypes, finding that they are rated as representative and clear. ProtoECGNet shows that prototype learning can be effectively scaled to complex, multi-label time-series classification, offering a practical path toward transparent and trustworthy deep learning models for clinical decision support.

arxiv情報

著者	Sahil Sethi,David Chen,Thomas Statchen,Michael C. Burkhart,Nipun Bhandari,Bashar Ramadan,Brett Beaulieu-Jones
発行日	2025-04-11 17:23:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

DocAgent: A Multi-Agent System for Automated Code Documentation Generation

投稿日: 2025年4月14日作成者: jarxiv

要約

高品質のコードドキュメントは、特にAIの時代におけるソフトウェア開発にとって重要です。
ただし、既存のアプローチが不完全、役に立たない、または実際に間違った出力を生成することが多いため、大規模な言語モデル（LLM）を使用して自動的に生成することは依然として困難です。
インクリメンタルコンテキスト構築のためにトポロジーコード処理を使用して、新しいマルチエージェントコラボレーションシステムであるDocagentを紹介します。
専門エージェント（リーダー、検索者、ライター、検証者、オーケストレーター）は、ドキュメントを共同で生成します。
また、完全性、有用性、真実性を評価する多面的な評価フレームワークを提案します。
包括的な実験により、Docagentはベースラインを一貫して大幅に上回ることが示されています。
私たちのアブレーション研究は、トポロジカル処理順序の重要な役割を確認しています。
Docagentは、複雑で独自のリポジトリで信頼できるコードドキュメント生成のための堅牢なアプローチを提供します。

要約(オリジナル)

High-quality code documentation is crucial for software development especially in the era of AI. However, generating it automatically using Large Language Models (LLMs) remains challenging, as existing approaches often produce incomplete, unhelpful, or factually incorrect outputs. We introduce DocAgent, a novel multi-agent collaborative system using topological code processing for incremental context building. Specialized agents (Reader, Searcher, Writer, Verifier, Orchestrator) then collaboratively generate documentation. We also propose a multi-faceted evaluation framework assessing Completeness, Helpfulness, and Truthfulness. Comprehensive experiments show DocAgent significantly outperforms baselines consistently. Our ablation study confirms the vital role of the topological processing order. DocAgent offers a robust approach for reliable code documentation generation in complex and proprietary repositories.

arxiv情報

著者	Dayu Yang,Antoine Simoulin,Xin Qian,Xiaoyi Liu,Yuwei Cao,Zhaopu Teng,Grey Yang
発行日	2025-04-11 17:50:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG, cs.SE | コメントを受け付けていません

Towards an Understanding of Context Utilization in Code Intelligence

投稿日: 2025年4月14日作成者: jarxiv

要約

コードインテリジェンスは、さまざまなコード関連のタスクの有効性と効率を向上させることを目的としたソフトウェアエンジニアリングの新しいドメインです。
最近の調査では、基本的な元のタスク入力（つまり、ソースコード）を超えてコンテキスト情報を組み込むと、モデルのパフォーマンスが大幅に向上する可能性があることが示唆されています。
このようなコンテキスト信号は、APIドキュメントなどのソースや、抽象的な構文ツリーなどの中間表現などのソースから直接または間接的に取得できます。
学問的関心が高まっているにもかかわらず、コードインテリジェンスのコンテキストの体系的な分析が不足しています。
このギャップに対処するために、2007年9月から2024年8月に発行された146の関連研究の広範な文献レビューを実施します。調査では4つの主要な貢献が得られます。
（1）出版傾向、会場、探索されたドメインを含む研究環境の定量分析。
（2）コードインテリジェンスで使用されるコンテキストタイプの新しい分類法。
（3）多様なコードインテリジェンスタスク全体のコンテキスト統合戦略を調査するタスク指向の分析。
（4）コンテキスト認識方法の評価方法の批判的評価。
これらの調査結果に基づいて、現在のコードインテリジェンスシステムのコンテキスト利用における基本的な課題を特定し、将来の研究の重要な機会を概説する研究ロードマップを提案します。

要約(オリジナル)

Code intelligence is an emerging domain in software engineering, aiming to improve the effectiveness and efficiency of various code-related tasks. Recent research suggests that incorporating contextual information beyond the basic original task inputs (i.e., source code) can substantially enhance model performance. Such contextual signals may be obtained directly or indirectly from sources such as API documentation or intermediate representations like abstract syntax trees can significantly improve the effectiveness of code intelligence. Despite growing academic interest, there is a lack of systematic analysis of context in code intelligence. To address this gap, we conduct an extensive literature review of 146 relevant studies published between September 2007 and August 2024. Our investigation yields four main contributions. (1) A quantitative analysis of the research landscape, including publication trends, venues, and the explored domains; (2) A novel taxonomy of context types used in code intelligence; (3) A task-oriented analysis investigating context integration strategies across diverse code intelligence tasks; (4) A critical evaluation of evaluation methodologies for context-aware methods. Based on these findings, we identify fundamental challenges in context utilization in current code intelligence systems and propose a research roadmap that outlines key opportunities for future research.

arxiv情報

著者	Yanlin Wang,Kefeng Duan,Dewu Zheng,Ensheng Shi,Fengji Zhang,Yanli Wang,Jiachi Chen,Xilin Liu,Yuchi Ma,Hongyu Zhang,Qianxiang Wang,Zibin Zheng
発行日	2025-04-11 17:59:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.SE | コメントを受け付けていません

Cut-and-Splat: Leveraging Gaussian Splatting for Synthetic Data Generation

投稿日: 2025年4月14日作成者: jarxiv

要約

合成画像の生成は、コンピュータービジョンモデルをトレーニングするためのラベル付きデータを安価に取得するための便利な方法です。
ただし、関連するオブジェクトの正確な3Dモデルを取得することが必要であり、結果の画像は、照明効果とカメラアーティファクトのシミュレーションにおける課題により、多くの場合、リアリズムにギャップがあることがよくあります。
これらの課題に対処するために、Gaussian Splattingと呼ばれる新しいビュー合成方法を使用して提案します。
特定のオブジェクトの高品質のコンテキスト対応インスタンスセグメンテーショントレーニングデータを生成するための合成データパイプラインを開発しました。
このプロセスは完全に自動化されており、ターゲットオブジェクトのビデオのみが必要です。
ターゲットオブジェクトのガウススプラットモデルをトレーニングし、ビデオからオブジェクトを自動的に抽出します。
ガウスのスプラッティングを活用して、オブジェクトをランダムな背景画像にレンダリングし、単眼の深さの推定を使用して、オブジェクトを信じられるポーズに配置します。
新しいデータセットを導入して、アプローチを検証し、カットアンドペーストや拡散モデルベースの生成など、他のデータ生成アプローチよりも優れたパフォーマンスを示します。

要約(オリジナル)

Generating synthetic images is a useful method for cheaply obtaining labeled data for training computer vision models. However, obtaining accurate 3D models of relevant objects is necessary, and the resulting images often have a gap in realism due to challenges in simulating lighting effects and camera artifacts. We propose using the novel view synthesis method called Gaussian Splatting to address these challenges. We have developed a synthetic data pipeline for generating high-quality context-aware instance segmentation training data for specific objects. This process is fully automated, requiring only a video of the target object. We train a Gaussian Splatting model of the target object and automatically extract the object from the video. Leveraging Gaussian Splatting, we then render the object on a random background image, and monocular depth estimation is employed to place the object in a believable pose. We introduce a novel dataset to validate our approach and show superior performance over other data generation approaches, such as Cut-and-Paste and Diffusion model-based generation.

arxiv情報

著者	Bram Vanherle,Brent Zoomers,Jeroen Put,Frank Van Reeth,Nick Michiels
発行日	2025-04-11 12:04:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

Generative Object Insertion in Gaussian Splatting with a Multi-View Diffusion Model

投稿日: 2025年4月14日作成者: jarxiv

要約

新しいオブジェクトを3Dコンテンツに生成して挿入することは、汎用性の高いシーンレクリエーションを実現するための説得力のあるアプローチです。
SDSの最適化またはシングルビューのインピンティングに依存する既存の方法は、多くの場合、高品質の結果を生み出すのに苦労しています。
これに対処するために、ガウススプラッティングに代表される3Dコンテンツにオブジェクト挿入の新しい方法を提案します。
私たちのアプローチでは、MvinPainterと呼ばれるマルチビュー拡散モデルを紹介します。これは、ビューコンシンテーションオブジェクトの入力を促進するために、事前に訓練された安定したビデオ拡散モデルの上に構築されています。
Mvinpainter内には、制御されたより予測可能なマルチビュー生成を可能にするために、コントロールネットベースの条件付き噴射モジュールを組み込みます。
マルチビューで塗装された結果を生成した後、これらのまばらな塗装されたビューからガウスのスプラット再構成を改良するためのマスク認識3D再構築技術をさらに提案します。
これらの製造技術を活用することにより、私たちのアプローチは多様な結果をもたらし、視聴と調和のとれた挿入を保証し、より良いオブジェクトの品質を生み出します。
広範な実験は、私たちのアプローチが既存の方法よりも優れていることを示しています。

要約(オリジナル)

Generating and inserting new objects into 3D content is a compelling approach for achieving versatile scene recreation. Existing methods, which rely on SDS optimization or single-view inpainting, often struggle to produce high-quality results. To address this, we propose a novel method for object insertion in 3D content represented by Gaussian Splatting. Our approach introduces a multi-view diffusion model, dubbed MVInpainter, which is built upon a pre-trained stable video diffusion model to facilitate view-consistent object inpainting. Within MVInpainter, we incorporate a ControlNet-based conditional injection module to enable controlled and more predictable multi-view generation. After generating the multi-view inpainted results, we further propose a mask-aware 3D reconstruction technique to refine Gaussian Splatting reconstruction from these sparse inpainted views. By leveraging these fabricate techniques, our approach yields diverse results, ensures view-consistent and harmonious insertions, and produces better object quality. Extensive experiments demonstrate that our approach outperforms existing methods.

arxiv情報

著者	Hongliang Zhong,Can Wang,Jingbo Zhang,Jing Liao
発行日	2025-04-11 12:04:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CV, cs.GR | コメントを受け付けていません

ODverse33: Is the New YOLO Version Always Better? A Multi Domain benchmark from YOLO v5 to v11

投稿日: 2025年4月14日作成者: jarxiv

要約

さまざまなドメインにわたってリアルタイムオブジェクト検出器の構築に広く使用されている（Yolo）モデルが広く使用されています。
新しいYoloバージョンの頻度が増加すると、重要な質問が発生します。
新しいバージョンは、以前のバージョンよりも常に優れていますか？
各ヨロバージョンのコアイノベーションは何ですか？これらの変更は、実際のパフォーマンスの向上にどのようにつながりますか？
このホワイトペーパーでは、Yolov1からYolov11までの主要な革新を要約して、11の多様なドメイン（自律運転、農業、農業、農業、野生生物、レテール、微小微細、セキュリティ）に及ぶ33のデータセットを含むODVerse33と呼ばれる包括的なベンチマークを導入します。
広範な実験結果を通じて。
この研究が、オブジェクト検出モデルの広範なユーザーに何らかのガイダンスを提供し、将来のリアルタイムオブジェクト検出器開発のためのいくつかの参照を提供できることを願っています。

要約(オリジナル)

You Look Only Once (YOLO) models have been widely used for building real-time object detectors across various domains. With the increasing frequency of new YOLO versions being released, key questions arise. Are the newer versions always better than their previous versions? What are the core innovations in each YOLO version and how do these changes translate into real-world performance gains? In this paper, we summarize the key innovations from YOLOv1 to YOLOv11, introduce a comprehensive benchmark called ODverse33, which includes 33 datasets spanning 11 diverse domains (Autonomous driving, Agricultural, Underwater, Medical, Videogame, Industrial, Aerial, Wildlife, Retail, Microscopic, and Security), and explore the practical impact of model improvements in real-world, multi-domain applications through extensive experimental results. We hope this study can provide some guidance to the extensive users of object detection models and give some references for future real-time object detector development.

arxiv情報

著者	Tianyou Jiang,Yang Zhong
発行日	2025-04-11 12:06:13+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

TACO: Adversarial Camouflage Optimization on Trucks to Fool Object Detectors

投稿日: 2025年4月14日作成者: jarxiv

要約

敵対的な攻撃は、自律車両や防衛システムなどの重要なアプリケーションにおける機械学習モデルの信頼性を脅かしています。
Yolov8のようなモデルでオブジェクト検出器がより堅牢になるにつれて、効果的な敵対的方法論の開発はますます困難になります。
最先端のオブジェクト検出器を欺くために3D車両モデルに敵対的なカモフラージュパターンを生成する新しいフレームワークであるトラック敵対的なカモフラージュ最適化（TACO）を提示します。
Unreal Engine 5を採用するTACOは、微分可能なレンダリングをフォトリアリックなレンダリングネットワークと統合して、Yolov8を対象とした敵対的なテクスチャを最適化します。
生成されたテクスチャが、検出器を欺き、視覚的にもっともらしいのに効果的であることを確認するために、畳み込みの滑らかな損失関数である一般化された滑らかな損失関数を導入します。
実験的評価は、TACOがYolov8の検出パフォーマンスを大幅に低下させ、目に見えないテストデータで0.0099のap@0.5を達成することを示しています。
さらに、これらの敵対的なパターンは、より高速なR-CNNや以前のヨロバージョンなどの他のオブジェクト検出モデルに強い転送可能性を示しています。

要約(オリジナル)

Adversarial attacks threaten the reliability of machine learning models in critical applications like autonomous vehicles and defense systems. As object detectors become more robust with models like YOLOv8, developing effective adversarial methodologies is increasingly challenging. We present Truck Adversarial Camouflage Optimization (TACO), a novel framework that generates adversarial camouflage patterns on 3D vehicle models to deceive state-of-the-art object detectors. Adopting Unreal Engine 5, TACO integrates differentiable rendering with a Photorealistic Rendering Network to optimize adversarial textures targeted at YOLOv8. To ensure the generated textures are both effective in deceiving detectors and visually plausible, we introduce the Convolutional Smooth Loss function, a generalized smooth loss function. Experimental evaluations demonstrate that TACO significantly degrades YOLOv8’s detection performance, achieving an AP@0.5 of 0.0099 on unseen test data. Furthermore, these adversarial patterns exhibit strong transferability to other object detection models such as Faster R-CNN and earlier YOLO versions.

arxiv情報

著者	Adonisz Dimitriu,Tamás Michaletzky,Viktor Remeli
発行日	2025-04-11 12:13:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

A Hybrid Fully Convolutional CNN-Transformer Model for Inherently Interpretable Medical Image Classification

投稿日: 2025年4月14日作成者: jarxiv

要約

多くの医療イメージングタスクでは、畳み込みニューラルネットワーク（CNNS）がローカル機能を階層的に効率的に抽出します。
より最近では、ビジョントランス（VITS）は、自己関節メカニズムを使用してグローバルな依存関係を捉えているが、畳み込みの固有の空間局在化がない人気を獲得しています。
したがって、CNNとVITを組み合わせたハイブリッドモデルが開発され、両方のアーキテクチャの強度を組み合わせています。
ただし、このようなハイブリッドCNN-vitモデルは解釈が困難であり、医療イメージングへの応用を妨げます。
この作業では、医療画像分類のために、解釈可能なハイブリッド完全畳み込みCNN変換装置アーキテクチャを紹介します。
VITの広く使用されている事後顕著性方法とは異なり、私たちのアプローチは、モデルの決定プロセスを直接反映する忠実でローカライズされた証拠マップを生成します。
Color Fundus画像を使用して、2つの医療画像分類タスクで方法を評価しました。
私たちのモデルは、ブラックボックスモデルと解釈可能なモデルの両方に比べて最先端の予測パフォーマンスを達成するだけでなく、単一のフォワードパスでクラス固有のスパースエビデンスマップを提供します。
このコードは、https：//anonymous.4open.science/r/expl-cnn-transformer/で入手できます。

要約(オリジナル)

In many medical imaging tasks, convolutional neural networks (CNNs) efficiently extract local features hierarchically. More recently, vision transformers (ViTs) have gained popularity, using self-attention mechanisms to capture global dependencies, but lacking the inherent spatial localization of convolutions. Therefore, hybrid models combining CNNs and ViTs have been developed to combine the strengths of both architectures. However, such hybrid CNN-ViT models are difficult to interpret, which hinders their application in medical imaging. In this work, we introduce an interpretable-by-design hybrid fully convolutional CNN-Transformer architecture for medical image classification. Unlike widely used post-hoc saliency methods for ViTs, our approach generates faithful and localized evidence maps that directly reflect the model’s decision process. We evaluated our method on two medical image classification tasks using color fundus images. Our model not only achieves state-of-the-art predictive performance compared to both black-box and interpretable models but also provides class-specific sparse evidence maps in a single forward pass. The code is available at: https://anonymous.4open.science/r/Expl-CNN-Transformer/.

arxiv情報

著者	Kerol Djoumessi,Samuel Ofosu Mensah,Philipp Berens
発行日	2025-04-11 12:15:22+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

投稿日: 2025年4月14日作成者: jarxiv

要約

Video Variation Autoencoder（VAE）はビデオを低次元の潜在スペースにエンコードし、モデルトレーニングコストを削減するために、ほとんどの潜在ビデオ拡散モデル（LVDMS）の重要なコンポーネントになります。
ただし、生成されたビデオの解像度と期間が増加するにつれて、ビデオVAEのエンコーディングコストは、LVDMSのトレーニングで制限的なボトルネックになります。
さらに、ほとんどのLVDMSで採用されたブロックごとの推論方法は、長時間のビデオを処理する際に潜在空間の不連続性につながる可能性があります。
計算ボトルネックに対処するための鍵は、ビデオを別々のコンポーネントに分解し、重要な情報を効率的にエンコードすることにあります。
ウェーブレット変換は、ビデオを複数の周波数ドメインコンポーネントに分解し、効率を大幅に改善する可能性があります。したがって、マルチレベルのウェーブレット変換を活用して低周波エネルギーの流れを潜在的な表現に促進する自動エンコーダーであるウェーブレットフローVAE（WF-VAE）を提案します。
さらに、ブロックごとの推論中に潜在空間の完全性を維持する因果キャッシュと呼ばれる方法を導入します。
最先端のビデオVAEと比較して、WF-VaeはPSNRとLPIPSメトリックの両方で優れたパフォーマンスを示し、競争力のある再構成の品質を維持しながら、2倍高いスループットと4倍のメモリ消費量を達成します。
私たちのコードとモデルは、https：//github.com/pku-yuangroup/wf-vaeで入手できます。

要約(オリジナル)

Video Variational Autoencoder (VAE) encodes videos into a low-dimensional latent space, becoming a key component of most Latent Video Diffusion Models (LVDMs) to reduce model training costs. However, as the resolution and duration of generated videos increase, the encoding cost of Video VAEs becomes a limiting bottleneck in training LVDMs. Moreover, the block-wise inference method adopted by most LVDMs can lead to discontinuities of latent space when processing long-duration videos. The key to addressing the computational bottleneck lies in decomposing videos into distinct components and efficiently encoding the critical information. Wavelet transform can decompose videos into multiple frequency-domain components and improve the efficiency significantly, we thus propose Wavelet Flow VAE (WF-VAE), an autoencoder that leverages multi-level wavelet transform to facilitate low-frequency energy flow into latent representation. Furthermore, we introduce a method called Causal Cache, which maintains the integrity of latent space during block-wise inference. Compared to state-of-the-art video VAEs, WF-VAE demonstrates superior performance in both PSNR and LPIPS metrics, achieving 2x higher throughput and 4x lower memory consumption while maintaining competitive reconstruction quality. Our code and models are available at https://github.com/PKU-YuanGroup/WF-VAE.

arxiv情報

著者	Zongjian Li,Bin Lin,Yang Ye,Liuhan Chen,Xinhua Cheng,Shenghai Yuan,Li Yuan
発行日	2025-04-11 12:31:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Open-CD: A Comprehensive Toolbox for Change Detection

投稿日: 2025年4月14日作成者: jarxiv

要約

Open-CDを提示します。これは、関連するコンポーネントとモジュールと同様に、豊富な変更検出方法を含む変更検出ツールボックスです。
ツールボックスは、OpenMMLabツールキット、Pytorch Imageモデルなどの一連のオープンソースの一般的なビジョンタスクツールから始まりました。これは、多くの一般的な変更検出方法と現代のモジュールをカバーする統合プラットフォームに徐々に進化します。
トレーニングコードと推論コードが含まれるだけでなく、データ分析に役立つスクリプトも提供します。
このツールボックスは、最も完全な変更検出ツールボックスであると考えています。
このレポートでは、オープンCDのさまざまな機能、サポートされている方法、アプリケーションを紹介します。
さらに、さまざまな方法とコンポーネントに関するベンチマーク調査も実施しています。
ツールボックスとベンチマークは、既存の方法を再実装するための柔軟なツールキットを提供し、独自の新しい変更検出器を開発することにより、成長する研究コミュニティに役立つことを願っています。
コードとモデルはhttps://github.com/likyoo/open-cdで入手できます。
先駆的に、このレポートには、主に著者によって提供されたオープンCDでサポートされているアルゴリズムの簡単な説明も含まれています。
この分野の研究者がこのプロジェクトに参加し、協力してよりオープンなコミュニティを作成することを心からお勧めします。
このツールキットとレポートは更新されます。

要約(オリジナル)

We present Open-CD, a change detection toolbox that contains a rich set of change detection methods as well as related components and modules. The toolbox started from a series of open source general vision task tools, including OpenMMLab Toolkits, PyTorch Image Models, etc. It gradually evolves into a unified platform that covers many popular change detection methods and contemporary modules. It not only includes training and inference codes, but also provides some useful scripts for data analysis. We believe this toolbox is by far the most complete change detection toolbox. In this report, we introduce the various features, supported methods and applications of Open-CD. In addition, we also conduct a benchmarking study on different methods and components. We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their own new change detectors. Code and models are available at https://github.com/likyoo/open-cd. Pioneeringly, this report also includes brief descriptions of the algorithms supported in Open-CD, mainly contributed by their authors. We sincerely encourage researchers in this field to participate in this project and work together to create a more open community. This toolkit and report will be kept updated.

arxiv情報

著者	Kaiyu Li,Jiawei Jiang,Andrea Codegoni,Chengxi Han,Yupeng Deng,Keyan Chen,Zhuo Zheng,Hao Chen,Ziyuan Liu,Yuantao Gu,Zhengxia Zou,Zhenwei Shi,Sheng Fang,Deyu Meng,Zhi Wang,Xiangyong Cao
発行日	2025-04-11 12:42:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント