jarxiv | Japanese arxiv | ページ 524

CodePDE: An Inference Framework for LLM-driven PDE Solver Generation

投稿日: 2025年5月14日作成者: jarxiv

要約

部分微分方程式（PDE）は、物理システムのモデリングの基本ですが、それらを解決することは依然として複雑な課題です。
従来の数値ソルバーは、実装するために専門知識に依存しており、計算上高価ですが、ニューラルネットワークベースのソルバーは大規模なトレーニングデータセットを必要とし、多くの場合解釈可能性がありません。
この作業では、PDEをコード生成タスクとしてフレーム化し、大規模な言語モデル（LLM）を使用してPDEソルバーを生成するための最初の推論フレームワークであるCodePDEを導入します。
高度な推論時間アルゴリズムとスケーリング戦略を活用して、CodePDEはPDE解決のためのLLMの重要な能力を解き放ちます：推論、デバッグ、自己補強、およびテスト時間スケーリング – すべてタスク固有のチューニングなし。
Codepdeは、さまざまな代表的なPDE問題にわたって超人的なパフォーマンスを達成します。
また、LLM生成ソルバーの体系的な経験的分析を提示し、その精度、効率、および数値スキームの選択を分析します。
私たちの調査結果は、PDE解決におけるLLMの約束と現在の制限を強調しており、ソルバーの設計と将来のモデル開発の機会に関する新しい視点を提供します。
私たちのコードは、https：//github.com/lithiumda/codepdeで入手できます。

要約(オリジナル)

Partial differential equations (PDEs) are fundamental to modeling physical systems, yet solving them remains a complex challenge. Traditional numerical solvers rely on expert knowledge to implement and are computationally expensive, while neural-network-based solvers require large training datasets and often lack interpretability. In this work, we frame PDE solving as a code generation task and introduce CodePDE, the first inference framework for generating PDE solvers using large language models (LLMs). Leveraging advanced inference-time algorithms and scaling strategies, CodePDE unlocks critical capacities of LLM for PDE solving: reasoning, debugging, selfrefinement, and test-time scaling — all without task-specific tuning. CodePDE achieves superhuman performance across a range of representative PDE problems. We also present a systematic empirical analysis of LLM generated solvers, analyzing their accuracy, efficiency, and numerical scheme choices. Our findings highlight the promise and the current limitations of LLMs in PDE solving, offering a new perspective on solver design and opportunities for future model development. Our code is available at https://github.com/LithiumDA/CodePDE.

arxiv情報

著者	Shanda Li,Tanya Marwah,Junhong Shen,Weiwei Sun,Andrej Risteski,Yiming Yang,Ameet Talwalkar
発行日	2025-05-13 17:58:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG, cs.NA, math.NA | コメントを受け付けていません

Dynamic Snake Upsampling Operater and Boundary-Skeleton Weighted Loss for Tubular Structure Segmentation

投稿日: 2025年5月14日作成者: jarxiv

要約

尿細管トポロジー構造（亀裂や血管系など）の正確なセグメンテーションは、さまざまな分野で重要であり、信頼できる下流の定量分析とモデリングを保証します。
ただし、セマンティックセグメンテーションや超解像度などの密な予測タスクでは、従来のアップサンプリングオペレーターは管状構造の細長さと形態の曲率に対応することはできません。
このペーパーでは、動的なヘビのアップサンプリング演算子と、トポロジーチューブラー構造に合わせた境界骨格の加重損失を紹介します。
具体的には、適応マップに従ってサンプリングストライドを動的に調整し、蛇行パスに沿ったサブピクセルサンプリングポイントのセットを選択し、より正確なサブピクセルレベルの特徴回復を有効にするサブピクセルサンプリングポイントを選択して、サンプリングストライドを動的に調整します。
一方、マスクのクラス比と距離フィールドに基づいて本体と境界重量の割り当てを取引する骨格から境界の増加した加重損失を提案し、ターゲットトポロジーの連続性と境界アライメント精度に焦点を合わせながら、本体の重複を維持します。
さまざまなドメインデータセットとバックボーンネットワークの実験は、このプラグアンドプレイのダイナミックスネークアップサンプリングオペレーターと境界スキレトンの加重損失の両方のピクセルごとのセグメンテーション精度と結果のトポロジー一貫性の両方をブーストすることを示しています。

要約(オリジナル)

Accurate segmentation of tubular topological structures (e.g., fissures and vasculature) is critical in various fields to guarantee dependable downstream quantitative analysis and modeling. However, in dense prediction tasks such as semantic segmentation and super-resolution, conventional upsampling operators cannot accommodate the slenderness of tubular structures and the curvature of morphology. This paper introduces a dynamic snake upsampling operators and a boundary-skeleton weighted loss tailored for topological tubular structures. Specifically, we design a snake upsampling operators based on an adaptive sampling domain, which dynamically adjusts the sampling stride according to the feature map and selects a set of subpixel sampling points along the serpentine path, enabling more accurate subpixel-level feature recovery for tubular structures. Meanwhile, we propose a skeleton-to-boundary increasing weighted loss that trades off main body and boundary weight allocation based on mask class ratio and distance field, preserving main body overlap while enhancing focus on target topological continuity and boundary alignment precision. Experiments across various domain datasets and backbone networks show that this plug-and-play dynamic snake upsampling operator and boundary-skeleton weighted loss boost both pixel-wise segmentation accuracy and topological consistency of results.

arxiv情報

著者	Yiqi Chen,Ganghai Huang,Sheng Zhang,Jianglin Dai
発行日	2025-05-13 12:56:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

Leveraging Segment Anything Model for Source-Free Domain Adaptation via Dual Feature Guided Auto-Prompting

投稿日: 2025年5月14日作成者: jarxiv

要約

セグメンテーション用のソースフリードメイン適応（SFDA）は、ソースモデルのみでターゲットドメインでうまく機能するように、ソースドメインでうまく機能するように適応することを目的としています。
自動的に正確な境界ボックスプロンプトを見つけることによるSFDAのモデル。
既存のSFDAアプローチで直接生成された境界ボックスは、ドメインギャップのために欠陥があることがわかります。この問題に取り組むために、ボックスプロンプトを検索するために、新しいデュアル機能ガイド（DFG）オートプロンプキングアプローチを提案します。
具体的には、ソースモデルは最初に機能集約フェーズでトレーニングされます。これは、ソースモデルをターゲットドメインに事前に適応させるだけでなく、ボックスプロンプト検索用に適切に準備された機能分布を構築します。
2番目のフェーズでは、2つの機能分布観測に基づいて、ターゲットモデル機能のガイダンスと、クラスごとのクラスター化されたターゲット機能とクラスごとの分散ターゲット機能をそれぞれ処理するSAM機能のガイダンスでボックスプロンプトを徐々に展開します。
ターゲットモデルの自信過剰予測によって引き起こされる潜在的に拡大された偽陽性領域を除去するために、SAMが生成する洗練された擬似適応は、接続性分析に基づいてさらに後処理されます。
3Dおよび2Dデータセットでの実験は、私たちのアプローチが従来の方法と比較して優れた性能をもたらすことを示しています。
コードはhttps://github.com/zheangh/dfgで入手できます。

要約(オリジナル)

Source-free domain adaptation (SFDA) for segmentation aims at adapting a model trained in the source domain to perform well in the target domain with only the source model and unlabeled target data.Inspired by the recent success of Segment Anything Model (SAM) which exhibits the generality of segmenting images of various modalities and in different domains given human-annotated prompts like bounding boxes or points, we for the first time explore the potentials of Segment Anything Model for SFDA via automatedly finding an accurate bounding box prompt. We find that the bounding boxes directly generated with existing SFDA approaches are defective due to the domain gap.To tackle this issue, we propose a novel Dual Feature Guided (DFG) auto-prompting approach to search for the box prompt. Specifically, the source model is first trained in a feature aggregation phase, which not only preliminarily adapts the source model to the target domain but also builds a feature distribution well-prepared for box prompt search. In the second phase, based on two feature distribution observations, we gradually expand the box prompt with the guidance of the target model feature and the SAM feature to handle the class-wise clustered target features and the class-wise dispersed target features, respectively. To remove the potentially enlarged false positive regions caused by the over-confident prediction of the target model, the refined pseudo-labels produced by SAM are further postprocessed based on connectivity analysis. Experiments on 3D and 2D datasets indicate that our approach yields superior performance compared to conventional methods. Code is available at https://github.com/zheangh/DFG.

arxiv情報

著者	Zheang Huai,Hui Tang,Yi Li,Zhuangzhuang Chen,Xiaomeng Li
発行日	2025-05-13 13:00:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

GradMix: Gradient-based Selective Mixup for Robust Data Augmentation in Class-Incremental Learning

投稿日: 2025年5月14日作成者: jarxiv

要約

継続的な学習の文脈では、以前の知識を維持しながら新しい知識を獲得することは重要な課題です。
既存の方法では、トレーニングのために以前のタスクデータのごく一部を保存するエクスペリエンスリプレイテクニックを使用します。
経験リプレイアプローチでは、限られた以前のタスクデータと十分な現在のタスクデータを混合することにより、モデルパフォーマンスをさらに改善するための有望な戦略としてデータの増強が浮上しています。
ただし、ランダムなサンプルペアからの混合サンプルでのトレーニングが以前のタスクの知識に害を及ぼす可能性があり、壊滅的な忘却が大きくなる可能性があることを理論的および経験的に分析します。
次に、階級学習における壊滅的な忘却を緩和するために特別に設計された堅牢なデータ増強方法であるGradMixを提案します。
Gradmixは、壊滅的な忘却を減らすために有害なクラスペアからではなく、役立つクラスペアからのサンプルのみを混合するクラスベースの基準を使用して、グラデーションベースの選択的混合を実行します。
さまざまな実際のデータセットでの実験は、GradMixが以前の知識の忘却を最小限に抑えることにより、データ増強ベースラインを精度の上回ることを示しています。

要約(オリジナル)

In the context of continual learning, acquiring new knowledge while maintaining previous knowledge presents a significant challenge. Existing methods often use experience replay techniques that store a small portion of previous task data for training. In experience replay approaches, data augmentation has emerged as a promising strategy to further improve the model performance by mixing limited previous task data with sufficient current task data. However, we theoretically and empirically analyze that training with mixed samples from random sample pairs may harm the knowledge of previous tasks and cause greater catastrophic forgetting. We then propose GradMix, a robust data augmentation method specifically designed for mitigating catastrophic forgetting in class-incremental learning. GradMix performs gradient-based selective mixup using a class-based criterion that mixes only samples from helpful class pairs and not from detrimental class pairs for reducing catastrophic forgetting. Our experiments on various real datasets show that GradMix outperforms data augmentation baselines in accuracy by minimizing the forgetting of previous knowledge.

arxiv情報

著者	Minsu Kim,Seong-Hyeon Hwang,Steven Euijong Whang
発行日	2025-05-13 13:01:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

The RaspGrade Dataset: Towards Automatic Raspberry Ripeness Grading with Deep Learning

投稿日: 2025年5月14日作成者: jarxiv

要約

この研究では、迅速で正確で非侵襲的な食品品質評価のためのコンピュータービジョンの適用を調査し、フルーツがコンベアベルトに沿って移動するにつれて、産業環境内の5つの異なるクラスにリアルタイムのラズベリーグレーディングの新しい課題に焦点を当てています。
これに対処するために、ラズベリーの専用データセット、つまりRaspgradeが取得され、細心の注意を払って注釈が付けられました。
インスタンスセグメンテーション実験により、正確な果物レベルのマスクが得られることが明らかになりました。
ただし、特定のラズベリーグレードの分類は、色の類似性と閉塞のために課題を提示しますが、他のものは色に基づいてより容易に区別できます。
取得および注釈付きのRaspgradeデータセットは、https：//huggingface.co/datasets/fbk-tev/raspgradeのHuggingfaceでアクセスできます。

要約(オリジナル)

This research investigates the application of computer vision for rapid, accurate, and non-invasive food quality assessment, focusing on the novel challenge of real-time raspberry grading into five distinct classes within an industrial environment as the fruits move along a conveyor belt. To address this, a dedicated dataset of raspberries, namely RaspGrade, was acquired and meticulously annotated. Instance segmentation experiments revealed that accurate fruit-level masks can be obtained; however, the classification of certain raspberry grades presents challenges due to color similarities and occlusion, while others are more readily distinguishable based on color. The acquired and annotated RaspGrade dataset is accessible on HuggingFace at: https://huggingface.co/datasets/FBK-TeV/RaspGrade.

arxiv情報

著者	Mohamed Lamine Mekhalfi,Paul Chippendale,Fabio Poiesi,Samuele Bonecher,Gilberto Osler,Nicola Zancanella
発行日	2025-05-13 13:07:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

GBT-SAM: Adapting a Foundational Deep Learning Model for Generalizable Brain Tumor Segmentation via Efficient Integration of Multi-Parametric MRI Data

投稿日: 2025年5月14日作成者: jarxiv

要約

神経膠腫は、正確なイメージングベースの診断を必要とする攻撃的な脳腫瘍であり、セグメンテーションは形態と治療の決定を評価する上で重要な役割を果たします。
膠腫の手動描写は時間がかかり、変動する傾向があり、一貫性を改善し、臨床ワークロードを緩和するために深い学習の使用を動機付けます。
ただし、既存の方法は、マルチパラメトリックMRI（MP-MRI）、特にスライス間のコンテキスト機能で利用可能な情報を完全に活用できないことが多く、通常、腫瘍タイプの変動全体で堅牢性を欠いている一方で、かなりの計算リソースを必要とします。
大規模なビジョンモデルであるAnyment Anything Model（SAM）を体積MP-MRIデータに適応させるパラメーター効率の高いディープ学習フレームワークであるGBT-SAMを提示します。
GBT-SAMは、4つのMRIモダリティすべてを組み込んでいる間、スキャンごとに2.6 \％未満のスライスを選択することにより、入力の複雑さを減らし、最小コストで必須の腫瘍関連情報を保存します。
さらに、私たちのモデルは、深度認識モジュールを組み込んだ2段階の微調整戦略によってトレーニングされており、SAMベースのアプローチの中で最も低い6.5mのトレーニング可能なパラメーターをもたらすわずか6.5mのトレーニング可能なパラメーターをもたらします。
GBT-SAMは、Brats成体神経膠腫データセットで93.54のDICEスコアを達成し、髄膜腫、小児神経膠腫、およびサハラ亜サハラ類神経膠腫データセットで堅牢なパフォーマンスを示します。
これらの結果は、MP-MRIを使用した脳腫瘍のセグメンテーションのための計算効率的でドメインロビーフレームワークとしてのGBT-SAMの可能性を強調しています。
私たちのコードとモデルは、https：//github.com/vpulab/med-sam-brainで入手できます。

要約(オリジナル)

Gliomas are aggressive brain tumors that require accurate imaging-based diagnosis, with segmentation playing a critical role in evaluating morphology and treatment decisions. Manual delineation of gliomas is time-consuming and prone to variability, motivating the use of deep learning to improve consistency and alleviate clinical workload. However, existing methods often fail to fully exploit the information available in multi-parametric MRI (mp-MRI), particularly inter-slice contextual features, and typically require considerable computational resources while lacking robustness across tumor type variations. We present GBT-SAM, a parameter-efficient deep learning framework that adapts the Segment Anything Model (SAM), a large-scale vision model, to volumetric mp-MRI data. GBT-SAM reduces input complexity by selecting fewer than 2.6\% of slices per scan while incorporating all four MRI modalities, preserving essential tumor-related information with minimal cost. Furthermore, our model is trained by a two-step fine-tuning strategy that incorporates a depth-aware module to capture inter-slice correlations and lightweight adaptation layers, resulting in just 6.5M trainable parameters, which is the lowest among SAM-based approaches. GBT-SAM achieves a Dice Score of 93.54 on the BraTS Adult Glioma dataset and demonstrates robust performance on Meningioma, Pediatric Glioma, and Sub-Saharan Glioma datasets. These results highlight GBT-SAM’s potential as a computationally efficient and domain-robust framework for brain tumor segmentation using mp-MRI. Our code and models are available at https://github.com/vpulab/med-sam-brain .

arxiv情報

著者	Cecilia Diana-Albelda,Roberto Alcover-Couso,Álvaro García-Martín,Jesus Bescos,Marcos Escudero-Viñolo
発行日	2025-05-13 13:15:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, eess.IV | コメントを受け付けていません

DFA-CON: A Contrastive Learning Approach for Detecting Copyright Infringement in DeepFake Art

投稿日: 2025年5月14日作成者: jarxiv

要約

視覚的なコンテンツの作成のための生成AIツールの最近の拡散特に、視覚的なアートワークのコンテキストでは、著作権侵害と偽造に関する深刻な懸念が生じました。
これらのモデルをトレーニングするために使用される大規模なデータセットには、多くの場合、著作権で保護されていないアートワークの混合物が含まれています。
生成モデルがトレーニングパターンを記憶する傾向を考えると、著作権違反の程度が変化しやすくなります。
最近提案されたDeepfakeart Challenge Benchmarkに基づいて、この作品は、著作権取り込みまたは鍛造AIに生成されたアートを検出するために設計された対照的な学習フレームワークであるDFA-CONを紹介します。
DFA-CONは、対照的な学習フレームワーク内で、オリジナルのアートワークとその偽造カウンターパートの間で親和性をもたらす差別的表現スペースを学びます。
このモデルは、入力、スタイルの転送、敵対的な摂動、CutMixなど、複数の攻撃タイプでトレーニングされています。
評価の結果は、ほとんどの攻撃タイプで堅牢な検出性能を示し、最近の前提条件の基礎モデルよりも優れています。
コードとモデルのチェックポイントは、受け入れられると公開されます。

要約(オリジナル)

Recent proliferation of generative AI tools for visual content creation-particularly in the context of visual artworks-has raised serious concerns about copyright infringement and forgery. The large-scale datasets used to train these models often contain a mixture of copyrighted and non-copyrighted artworks. Given the tendency of generative models to memorize training patterns, they are susceptible to varying degrees of copyright violation. Building on the recently proposed DeepfakeArt Challenge benchmark, this work introduces DFA-CON, a contrastive learning framework designed to detect copyright-infringing or forged AI-generated art. DFA-CON learns a discriminative representation space, posing affinity among original artworks and their forged counterparts within a contrastive learning framework. The model is trained across multiple attack types, including inpainting, style transfer, adversarial perturbation, and cutmix. Evaluation results demonstrate robust detection performance across most attack types, outperforming recent pretrained foundation models. Code and model checkpoints will be released publicly upon acceptance.

arxiv情報

著者	Haroon Wahab,Hassan Ugail,Irfan Mehmood
発行日	2025-05-13 13:23:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection

投稿日: 2025年5月14日作成者: jarxiv

要約

マスクされたビデオモデリング〜（MVM）は、視覚基盤モデルの非常に効果的なトレーニング前戦略として浮上しており、モデルは可視トークンからの情報を使用してマスクされた空間的トークンを再構築します。
ただし、このようなアプローチの重要な課題は、適切なマスキング戦略を選択することにあります。
以前の研究では、ランダムおよびチューブベースのマスキングなどの事前定義されたマスキングテクニックや、外部の事前訓練モデルからのキーモーションプライアー、光学フロー、セマンティックキューを活用するアプローチを調査しています。
この作業では、トークンのモーションダイナミクスをモデル化し、マスクされた自動エンコーダー（MAE）フレームワークにビデオでモーション中心のトークンを選択することができる、斬新で一般化可能な軌跡を意識する適応トークンサンプラー（TATS）を紹介します。
さらに、近位政策最適化（PPO）を使用して、MAEとTATの両方をゼロから共同最適化できるようにする統一されたトレーニング戦略を提案します。
私たちのモデルは、アクション認識の下流のタスクでパフォーマンスを損なうことなく攻撃的なマスキングを可能にしながら、トレーニング前のメモリ効率を保証することを可能にします。
V2、Kinetics-400、UCF101、およびHMDB51を含む、4つのベンチマークにわたる提案されたアプローチの広範な実験は、他の最先端の方法と比較して、作業の有効性、転送可能性、一般化、および効率性を示しています。

要約(オリジナル)

Masked video modeling~(MVM) has emerged as a highly effective pre-training strategy for visual foundation models, whereby the model reconstructs masked spatiotemporal tokens using information from visible tokens. However, a key challenge in such approaches lies in selecting an appropriate masking strategy. Previous studies have explored predefined masking techniques, including random and tube-based masking, as well as approaches that leverage key motion priors, optical flow and semantic cues from externally pre-trained models. In this work, we introduce a novel and generalizable Trajectory-Aware Adaptive Token Sampler (TATS), which models the motion dynamics of tokens and can be seamlessly integrated into the masked autoencoder (MAE) framework to select motion-centric tokens in videos. Additionally, we propose a unified training strategy that enables joint optimization of both MAE and TATS from scratch using Proximal Policy Optimization (PPO). We show that our model allows for aggressive masking without compromising performance on the downstream task of action recognition while also ensuring that the pre-training remains memory efficient. Extensive experiments of the proposed approach across four benchmarks, including Something-Something v2, Kinetics-400, UCF101, and HMDB51, demonstrate the effectiveness, transferability, generalization, and efficiency of our work compared to other state-of-the-art methods.

arxiv情報

著者	Ayush K. Rai,Kyle Min,Tarun Krishna,Feiyan Hu,Alan F. Smeaton,Noel E. O’Connor
発行日	2025-05-13 13:35:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

Thermal Detection of People with Mobility Restrictions for Barrier Reduction at Traffic Lights Controlled Intersections

投稿日: 2025年5月14日作成者: jarxiv

要約

コンピュータービジョンの深い学習における急速な進歩により、RGBカメラベースの適応トラフィックシステムが採用され、交通の安全性と歩行者の快適性が向上しました。
ただし、これらのシステムは、しばしばモビリティ制限のある人々のニーズを見落としています。
さらに、RGBカメラの使用には、有害な天候や視認性の低い条件下での限られた検出性能、およびプライバシーの懸念が高まるなど、重要な課題があります。
これらの問題に対処するために、歩行障害または移動性の負担のある個人の信号持続時間を動的に調整し、視覚障害のある個人の聴覚信号をトリガーする完全に自動化された熱検出器ベースのトラフィックシステムを提案し、それによってすべてのユーザーのバリアフリーの交差点に向かって進みます。
この目的のために、多様な歩行者シナリオをキャプチャするように設計されたモビリティ制限のある人（TD4PWMR）のサーマルデータセットを構築します。特に、さまざまな照明、天候、混雑した都市環境など、さまざまな環境条件の下で移動補助剤またはモビリティの負担を伴う個人に焦点を当てています。
サーマルイメージングは、プライバシーと不利な条件に対する堅牢性の点で利点を提供しますが、色の不足と細かいテクスチャの詳細と一般的に熱画像の解像度が低いため、オブジェクト検出に固有のハードルも導入します。
これらの制限を克服するために、熱イメージングの検出精度と堅牢性を高めるための高度な特徴抽出と注意メカニズムを統合するYoloアーキテクチャの新しいバリアントであるYolo-Thermalを開発します。
実験は、提案された熱検出器が既存の検出器よりも優れていることを実証し、提案された信号システムはバリアフリーの交差点を効果的に強化することを示しています。
ソースコードとデータセットは、https：//github.com/leon2014dresden/yolo-thermalで入手できます。

要約(オリジナル)

Rapid advances in deep learning for computer vision have driven the adoption of RGB camera-based adaptive traffic light systems to improve traffic safety and pedestrian comfort. However, these systems often overlook the needs of people with mobility restrictions. Moreover, the use of RGB cameras presents significant challenges, including limited detection performance under adverse weather or low-visibility conditions, as well as heightened privacy concerns. To address these issues, we propose a fully automated, thermal detector-based traffic light system that dynamically adjusts signal durations for individuals with walking impairments or mobility burden and triggers the auditory signal for visually impaired individuals, thereby advancing towards barrier-free intersection for all users. To this end, we build the thermal dataset for people with mobility restrictions (TD4PWMR), designed to capture diverse pedestrian scenarios, particularly focusing on individuals with mobility aids or mobility burden under varying environmental conditions, such as different lighting, weather, and crowded urban settings. While thermal imaging offers advantages in terms of privacy and robustness to adverse conditions, it also introduces inherent hurdles for object detection due to its lack of color and fine texture details and generally lower resolution of thermal images. To overcome these limitations, we develop YOLO-Thermal, a novel variant of the YOLO architecture that integrates advanced feature extraction and attention mechanisms for enhanced detection accuracy and robustness in thermal imaging. Experiments demonstrate that the proposed thermal detector outperforms existing detectors, while the proposed traffic light system effectively enhances barrier-free intersection. The source codes and dataset are available at https://github.com/leon2014dresden/YOLO-THERMAL.

arxiv情報

著者	Xiao Ni,Carsten Kuehnel,Xiaoyi Jiang
発行日	2025-05-13 13:44:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking

投稿日: 2025年5月14日作成者: jarxiv

要約

手術シーンのセグメンテーションは、コンピューター支援の手術で重要であり、手術の質と患者の転帰を高めるために不可欠です。
最近、ターゲットオブジェクトをセグメント化するためのインタラクティブな体験を外科医に提供するという利点を考えると、外科的セグメンテーションを参照することが出現しています。
ただし、既存の方法は、効率が低く、短期追跡によって制限され、複雑な実世界の外科シナリオでの適用性が妨げられます。
このホワイトペーパーでは、resurgsam2を紹介します。これは、モデル2を実行するためにセグメントをテキスト参照ターゲット検出を実行する2段階の外科的参照セグメンテーションフレームワークを紹介し、その後、信頼できる初期フレーム識別と多様性駆動型の長期メモリを追跡します。
検出段階では、正確な検出とセグメンテーションの結果を生成するために、クロスモーダルの空間的時代のマンバを提案します。
これらの結果に基づいて、信頼できる初期フレーム選択戦略は、後続の追跡の信頼できるフレームを識別します。
初期フレームを選択すると、メソッドは追跡段階に移行し、信頼できる多様なメモリバンクを維持する多様性駆動型メモリメカニズムを組み込み、一貫した長期追跡を確保します。
広範な実験は、Resurgsam2が既存の方法と比較して精度と効率を大幅に改善し、61.2 fpsでリアルタイムで動作することを示しています。
コードとデータセットは、https：//github.com/jinlab-imvr/resurgsam2で入手できます。

要約(オリジナル)

Surgical scene segmentation is critical in computer-assisted surgery and is vital for enhancing surgical quality and patient outcomes. Recently, referring surgical segmentation is emerging, given its advantage of providing surgeons with an interactive experience to segment the target object. However, existing methods are limited by low efficiency and short-term tracking, hindering their applicability in complex real-world surgical scenarios. In this paper, we introduce ReSurgSAM2, a two-stage surgical referring segmentation framework that leverages Segment Anything Model 2 to perform text-referred target detection, followed by tracking with reliable initial frame identification and diversity-driven long-term memory. For the detection stage, we propose a cross-modal spatial-temporal Mamba to generate precise detection and segmentation results. Based on these results, our credible initial frame selection strategy identifies the reliable frame for the subsequent tracking. Upon selecting the initial frame, our method transitions to the tracking stage, where it incorporates a diversity-driven memory mechanism that maintains a credible and diverse memory bank, ensuring consistent long-term tracking. Extensive experiments demonstrate that ReSurgSAM2 achieves substantial improvements in accuracy and efficiency compared to existing methods, operating in real-time at 61.2 FPS. Our code and datasets will be available at https://github.com/jinlab-imvr/ReSurgSAM2.

arxiv情報

著者	Haofeng Liu,Mingqi Gao,Xuxiao Luo,Ziyue Wang,Guanyi Qin,Junde Wu,Yueming Jin
発行日	2025-05-13 13:56:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, eess.IV, q-bio.TO | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント