jarxiv | Japanese arxiv | ページ 1618

UNB StepUP: A footStep database for gait analysis and recognition using Underfoot Pressure

投稿日: 2025年2月25日作成者: jarxiv

要約

歩行とは、歩行中に生成される四肢の動きのパターンを指します。これは、物理的特性と行動特性の両方のために各個人に固有のものです。
歩行パターンは、生体認証、生体力学、スポーツ、リハビリテーションで広く研究されています。
従来の方法はビデオとモーションキャプチャに依存していますが、足元圧力センシングテクノロジーの進歩は、歩行に関するより深い洞察を提供するようになりました。
ただし、ウォーキング中の足の下の圧力は、大きくて公開されているデータセットが不足しているため、既知のままです。
これに対処するために、UNBステップアップデータベースが作成され、高解像度の圧力センシングタイル（4センサー/cm \ textSuperscript {2}、1.2m x 3.6m）で収集された歩行圧力データを備えています。
最初のリリースであるUNB Stepup-P150には、さまざまな歩行速度（優先、遅い、高速、遅い）および履物の種類（裸足、標準靴、2つのパーソナルシューズ）にわたる150人の個人から200,000個以上のフィートステップが含まれています。
この種の最大かつ最も包括的なデータセットとして、生体力学と深い学習における新しい研究機会を提示しながら、生体認証の歩行認識をサポートします。
UNB Stepup-P150データセットは、圧力ベースの歩行分析と認識のための新しいベンチマークを設定します。

要約(オリジナル)

Gait refers to the patterns of limb movement generated during walking, which are unique to each individual due to both physical and behavioural traits. Walking patterns have been widely studied in biometrics, biomechanics, sports, and rehabilitation. While traditional methods rely on video and motion capture, advances in underfoot pressure sensing technology now offer deeper insights into gait. However, underfoot pressures during walking remain underexplored due to the lack of large, publicly accessible datasets. To address this, the UNB StepUP database was created, featuring gait pressure data collected with high-resolution pressure sensing tiles (4 sensors/cm\textsuperscript{2}, 1.2m by 3.6m). Its first release, UNB StepUP-P150, includes over 200,000 footsteps from 150 individuals across various walking speeds (preferred, slow-to-stop, fast, and slow) and footwear types (barefoot, standard shoes, and two personal shoes). As the largest and most comprehensive dataset of its kind, it supports biometric gait recognition while presenting new research opportunities in biomechanics and deep learning. The UNB StepUP-P150 dataset sets a new benchmark for pressure-based gait analysis and recognition.

arxiv情報

著者	Robyn Larracy,Angkoon Phinyomark,Ala Salehi,Eve MacDonald,Saeed Kazemi,Shikder Shafiul Bashar,Aaron Tabor,Erik Scheme
発行日	2025-02-24 15:21:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

AeroGen: Enhancing Remote Sensing Object Detection with Diffusion-Driven Data Generation

投稿日: 2025年2月25日作成者: jarxiv

要約

リモートセンシング画像オブジェクト検出（RSIOD）は、衛星または空中画像内の特定のオブジェクトを識別して特定することを目的としています。
ただし、現在のRSIODデータセットには、ラベル付きデータが不足しているため、現在の検出アルゴリズムのパフォーマンスが大幅に制限されています。
たとえば、既存の手法、たとえば、データの増強や半監視学習は、この希少性の問題をある程度緩和することができますが、それらは高品質のラベル付きデータに大きく依存し、まれなオブジェクトクラスではより悪化しています。
この問題に対処するために、このペーパーでは、RSIODに合わせたレイアウト制御可能な拡散生成モデル（つまり、エアロゲン）を提案します。
私たちの知る限り、Aerogenは、水平および回転したボックス条件生成を同時にサポートする最初のモデルであり、特定のレイアウトとオブジェクトのカテゴリ要件を満たす高品質の合成画像の生成を可能にします。
さらに、生成されたデータの多様性と品質の両方を強化するために、多様性条件付きジェネレーターとフィルタリングメカニズムを統合するエンドツーエンドのデータ増強フレームワークを提案します。
実験結果は、私たちの方法によって生成された合成データが高品質で多様性であることを示しています。
さらに、合成RSIODデータは、既存のRSIODモデルの検出性能を大幅に改善できます。つまり、Dior、Dior-R、およびHRSCデータセットのMAPメトリックは、それぞれ3.7％、4.3％、および2.43％改善されます。
このコードは、https：//github.com/sonettoo/aerogenで入手できます。

要約(オリジナル)

Remote sensing image object detection (RSIOD) aims to identify and locate specific objects within satellite or aerial imagery. However, there is a scarcity of labeled data in current RSIOD datasets, which significantly limits the performance of current detection algorithms. Although existing techniques, e.g., data augmentation and semi-supervised learning, can mitigate this scarcity issue to some extent, they are heavily dependent on high-quality labeled data and perform worse in rare object classes. To address this issue, this paper proposes a layout-controllable diffusion generative model (i.e. AeroGen) tailored for RSIOD. To our knowledge, AeroGen is the first model to simultaneously support horizontal and rotated bounding box condition generation, thus enabling the generation of high-quality synthetic images that meet specific layout and object category requirements. Additionally, we propose an end-to-end data augmentation framework that integrates a diversity-conditioned generator and a filtering mechanism to enhance both the diversity and quality of generated data. Experimental results demonstrate that the synthetic data produced by our method are of high quality and diversity. Furthermore, the synthetic RSIOD data can significantly improve the detection performance of existing RSIOD models, i.e., the mAP metrics on DIOR, DIOR-R, and HRSC datasets are improved by 3.7%, 4.3%, and 2.43%, respectively. The code is available at https://github.com/Sonettoo/AeroGen.

arxiv情報

著者	Datao Tang,Xiangyong Cao,Xuan Wu,Jialin Li,Jing Yao,Xueru Bai,Dongsheng Jiang,Yin Li,Deyu Meng
発行日	2025-02-24 15:22:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

CAR-LOAM: Color-Assisted Robust LiDAR Odometry and Mapping

投稿日: 2025年2月25日作成者: jarxiv

要約

この手紙では、正確なLidar臭気とマッピング（ローム）のための色アシストされた堅牢なフレームワークを提案します。
Lidarとカメラの両方からデータを同時に受信すると、フレームワークはカメラ画像からの色情報を利用してLidarポイント雲を色付けし、反復ポーズ最適化を実行します。
Lidarスキャンごとに、対応する画像を使用してエッジと平面の特徴が抽出され、色付けされ、グローバルマップに一致します。
具体的には、知覚的に均一な色の違いの重み付け戦略を採用して、色の対応外れ値と、ポーズ最適化プロセス中の位置対応外れ値の影響を緩和するために、ウェルシュの関数に基づいた堅牢なエラーメトリックを除外します。
その結果、システムは正確なローカリゼーションを実現し、環境の密度が高く、正確で、色付きの3次元（3D）マップを再構築します。
複雑な森林やキャンパスを含む挑戦的なシナリオを使用した徹底的な実験は、現在の最先端の方法と比較して、この方法がより高い堅牢性と精度を提供することを示しています。

要約(オリジナル)

In this letter, we propose a color-assisted robust framework for accurate LiDAR odometry and mapping (LOAM). Simultaneously receiving data from both the LiDAR and the camera, the framework utilizes the color information from the camera images to colorize the LiDAR point clouds and then performs iterative pose optimization. For each LiDAR scan, the edge and planar features are extracted and colored using the corresponding image and then matched to a global map. Specifically, we adopt a perceptually uniform color difference weighting strategy to exclude color correspondence outliers and a robust error metric based on the Welsch’s function to mitigate the impact of positional correspondence outliers during the pose optimization process. As a result, the system achieves accurate localization and reconstructs dense, accurate, colored and three-dimensional (3D) maps of the environment. Thorough experiments with challenging scenarios, including complex forests and a campus, show that our method provides higher robustness and accuracy compared with current state-of-the-art methods.

arxiv情報

著者	Yufei Lu,Yuetao Li,Zhizhou Jia,Qun Hao,Shaohui Zhang
発行日	2025-02-24 15:28:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing

投稿日: 2025年2月25日作成者: jarxiv

要約

拡散モデルの最近の進歩により、ビデオ生成と編集機能が大幅に改善されました。
ただし、クラスレベル、インスタンスレベル、およびパートレベルの変更を網羅するマルチグレインビデオ編集は、依然として手ごわい課題です。
マルチグレイン編集の主な困難には、テキスト間コントロールのセマンティックな不整合と、拡散モデル内の特徴結合が含まれます。
これらの困難に対処するために、ビデオコンテンツを細かく制御するために時空（クロスおよびセルフ）の注意メカニズムを調節するゼロショットアプローチであるVidegRainを提示します。
各ローカルプロンプトの対応する空間的延長領域への注意を拡大しながら、相互作用における無関係な領域との相互作用を最小限に抑えることにより、テキスト間制御を強化します。
さらに、地域内の意識を高め、自己副次的な地域間干渉を減らすことにより、特徴分離を改善します。
広範な実験は、私たちの方法が実際のシナリオで最新のパフォーマンスを達成することを示しています。
コード、データ、およびデモは、https：//knightyxp.github.io/videograin_project_page/で入手できます。

要約(オリジナル)

Recent advancements in diffusion models have significantly improved video generation and editing capabilities. However, multi-grained video editing, which encompasses class-level, instance-level, and part-level modifications, remains a formidable challenge. The major difficulties in multi-grained editing include semantic misalignment of text-to-region control and feature coupling within the diffusion model. To address these difficulties, we present VideoGrain, a zero-shot approach that modulates space-time (cross- and self-) attention mechanisms to achieve fine-grained control over video content. We enhance text-to-region control by amplifying each local prompt’s attention to its corresponding spatial-disentangled region while minimizing interactions with irrelevant areas in cross-attention. Additionally, we improve feature separation by increasing intra-region awareness and reducing inter-region interference in self-attention. Extensive experiments demonstrate our method achieves state-of-the-art performance in real-world scenarios. Our code, data, and demos are available at https://knightyxp.github.io/VideoGrain_project_page/

arxiv情報

著者	Xiangpeng Yang,Linchao Zhu,Hehe Fan,Yi Yang
発行日	2025-02-24 15:39:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

Continuous Wrist Control on the Hannes Prosthesis: a Vision-based Shared Autonomy Framework

投稿日: 2025年2月25日作成者: jarxiv

要約

補綴握手のためのほとんどの制御技術は、器用な指の制御に焦点を当てていますが、手首の動きを見落としています。
これにより、ユーザーは肘、肩、腰で補償的な動きを実行し、手首をつかむために適応させます。
共有された自律フレームワークでユーザーと自動システムとのコラボレーションを活用するコンピュータービジョンベースのシステムを提案し、補綴腕の手首の自由度を継続的に制御し、より自然なグラストゥグラスの動きのアプローチを促進します
。
私たちのパイプラインでは、補綴手首をシームレスに制御してターゲットオブジェクトに従い、最終的にユーザーの意図に従って把握するためにそれを向けます。
定量分析を通じて各システムコンポーネントの有効性を評価し、最終的にHannesの補綴腕にメソッドを展開します。
コードとビデオ：https：//hsp-iit.github.io/hannes-wrist-control。

要約(オリジナル)

Most control techniques for prosthetic grasping focus on dexterous fingers control, but overlook the wrist motion. This forces the user to perform compensatory movements with the elbow, shoulder and hip to adapt the wrist for grasping. We propose a computer vision-based system that leverages the collaboration between the user and an automatic system in a shared autonomy framework, to perform continuous control of the wrist degrees of freedom in a prosthetic arm, promoting a more natural approach-to-grasp motion. Our pipeline allows to seamlessly control the prosthetic wrist to follow the target object and finally orient it for grasping according to the user intent. We assess the effectiveness of each system component through quantitative analysis and finally deploy our method on the Hannes prosthetic arm. Code and videos: https://hsp-iit.github.io/hannes-wrist-control.

arxiv情報

著者	Federico Vasile,Elisa Maiettini,Giulia Pasquale,Nicolò Boccardo,Lorenzo Natale
発行日	2025-02-24 15:48:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.RO, cs.SY, eess.SY | コメントを受け付けていません

FoMo: Multi-Modal, Multi-Scale and Multi-Task Remote Sensing Foundation Models for Forest Monitoring

投稿日: 2025年2月25日作成者: jarxiv

要約

森林は生態系に不可欠であり、生物多様性と本質的なサービスをサポートしていますが、土地利用と気候変動のために急速に変化しています。
否定的な影響を理解して緩和するには、幅広い感覚モダリティから世界規模の森林のデータを解析し、多様な森林監視アプリケーションでそれらを使用する必要があります。
このようなデータとアプリケーションの多様性は、さまざまなダウンストリームタスクの汎用性のある基盤として機能する、事前に訓練された大規模な基礎モデルの開発を通じて効果的に対処できます。
ただし、いくつかの森林管理タスクに優れたリモートセンシングモダリティは、環境条件、オブジェクトスケール、画像取得モード、時空間の解像度などの変動を考慮して特に困難です。
統一された森林監視ベンチマーク（FOMOベンチ）は、このような柔軟性を備えた基礎モデルを評価するために慎重に構築されています。
FOMOベンチは、さまざまな地理的領域をカバーする衛星、航空、および在庫データを含む15の多様なデータセットで構成され、さまざまな時間、空間、およびスペクトルの解像度を備えたマルチスペクトル、赤緑色、合成開口レーダー、ライダーデータを含みます。
FOMOベンチには、分類、セグメンテーション、およびオブジェクト検出にまたがる複数のタイプの森林監視タスクが含まれます。
FOMOベンチのタスクと地理的多様性を強化するために、衛星画像と1,000以上のカテゴリと階層分類レベルにわたる樹種分析の地上注釈を組み合わせたグローバルデータセットであるTallosを紹介します。
最後に、リモートセンシングで一般的に使用されるモダリティとスペクトルバンドの組み合わせを処理する能力を備えた基礎モデルを開発するための事前トレーニングフレームワークであるFOMO-NETを提案します。

要約(オリジナル)

Forests are vital to ecosystems, supporting biodiversity and essential services, but are rapidly changing due to land use and climate change. Understanding and mitigating negative effects requires parsing data on forests at global scale from a broad array of sensory modalities, and using them in diverse forest monitoring applications. Such diversity in data and applications can be effectively addressed through the development of a large, pre-trained foundation model that serves as a versatile base for various downstream tasks. However, remote sensing modalities, which are an excellent fit for several forest management tasks, are particularly challenging considering the variation in environmental conditions, object scales, image acquisition modes, spatio-temporal resolutions, etc. With that in mind, we present the first unified Forest Monitoring Benchmark (FoMo-Bench), carefully constructed to evaluate foundation models with such flexibility. FoMo-Bench consists of 15 diverse datasets encompassing satellite, aerial, and inventory data, covering a variety of geographical regions, and including multispectral, red-green-blue, synthetic aperture radar and LiDAR data with various temporal, spatial and spectral resolutions. FoMo-Bench includes multiple types of forest-monitoring tasks, spanning classification, segmentation, and object detection. To enhance task and geographic diversity in FoMo-Bench, we introduce TalloS, a global dataset combining satellite imagery with ground-based annotations for tree species classification across 1,000+ categories and hierarchical taxonomic levels. Finally, we propose FoMo-Net, a pre-training framework to develop foundation models with the capacity to process any combination of commonly used modalities and spectral bands in remote sensing.

arxiv情報

著者	Nikolaos Ioannis Bountos,Arthur Ouaknine,Ioannis Papoutsis,David Rolnick
発行日	2025-02-24 15:49:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

TV-based Deep 3D Self Super-Resolution for fMRI

投稿日: 2025年2月25日作成者: jarxiv

要約

機能的磁気共鳴イメージング（fMRI）は認知プロセスに関する貴重な洞察を提供しますが、その固有の空間的制限は、脳のきめ細かい機能アーキテクチャの詳細な分析に課題をもたらします。
より具体的には、MRIスキャナーとシーケンスの仕様により、時間分解能、空間解像度、信号対雑音比、スキャン時間の間にトレードオフが課されます。
ディープラーニング（DL）スーパー解像度（SR）メソッドは、fMRI解像度を強化するための有望なソリューションとして浮上しており、通常、スキャン時間が低いと獲得された低解像度（LR）画像から高解像度（HR）画像を生成します。
ただし、ほとんどの既存のSRアプローチは、トレーニンググラウンドトゥルース（GT）HRデータを必要とする監視されたDL技術に依存しています。
この論文では、DLネットワークを分析的アプローチと完全なバリエーション（TV）の正則化と組み合わせた新しい自己監視DL SR SRモデルを紹介します。
私たちの方法は、外部GT画像の必要性を排除し、監視されたDL技術と比較して競争力のあるパフォーマンスを達成し、機能マップを保存します。

要約(オリジナル)

While functional Magnetic Resonance Imaging (fMRI) offers valuable insights into cognitive processes, its inherent spatial limitations pose challenges for detailed analysis of the fine-grained functional architecture of the brain. More specifically, MRI scanner and sequence specifications impose a trade-off between temporal resolution, spatial resolution, signal-to-noise ratio, and scan time. Deep Learning (DL) Super-Resolution (SR) methods have emerged as a promising solution to enhance fMRI resolution, generating high-resolution (HR) images from low-resolution (LR) images typically acquired with lower scanning times. However, most existing SR approaches depend on supervised DL techniques, which require training ground truth (GT) HR data, which is often difficult to acquire and simultaneously sets a bound for how far SR can go. In this paper, we introduce a novel self-supervised DL SR model that combines a DL network with an analytical approach and Total Variation (TV) regularization. Our method eliminates the need for external GT images, achieving competitive performance compared to supervised DL techniques and preserving the functional maps.

arxiv情報

著者	Fernando Pérez-Bueno,Hongwei Bran Li,Matthew S. Rosen,Shahin Nasr,Cesar Caballero-Gaudes,Juan Eugenio Iglesias
発行日	2025-02-24 15:54:50+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, eess.IV | コメントを受け付けていません

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation

投稿日: 2025年2月25日作成者: jarxiv

要約

画像条件の画像生成の最近の進歩は、実質的な進歩を示しています。
ただし、前景で調整された画像生成は、露出度の低いままであり、オブジェクトの整合性の侵害、前景色の矛盾、限られた多様性、制御の柔軟性の低下などの課題に遭遇します。
これらの課題は、不正確なトレーニングマスク、限られた前景の意味的理解、データ分布バイアス、視覚とテキストのプロンプトの間の固有の干渉に悩まされる現在のエンドツーエンドの入力モデルから生じます。
これらの制限を克服するために、従来のエンドツーエンドアプローチから離れたマルチエージェントフレームワークをどこにでも提示します。
このフレームワークでは、各エージェントは、前景の理解、多様性の強化、オブジェクトの整合性保護、テキストの迅速な一貫性など、明確な側面に特化しています。
私たちのフレームワークは、オプションのユーザーテキスト入力を組み込み、自動化された品質評価を実行し、必要に応じて再生を開始する機能により、さらに強化されています。
包括的な実験は、このモジュラー設計が既存のエンドツーエンドモデルの制限を効果的に克服し、その結果、前景が調整された画像生成においてより高い忠実度、品質、多様性、制御性をもたらすことを示しています。
さらに、Anywhere Frameworkは拡張可能であり、個々のエージェントの将来の進歩から利益を得ることができます。

要約(オリジナル)

Recent advancements in image-conditioned image generation have demonstrated substantial progress. However, foreground-conditioned image generation remains underexplored, encountering challenges such as compromised object integrity, foreground-background inconsistencies, limited diversity, and reduced control flexibility. These challenges arise from current end-to-end inpainting models, which suffer from inaccurate training masks, limited foreground semantic understanding, data distribution biases, and inherent interference between visual and textual prompts. To overcome these limitations, we present Anywhere, a multi-agent framework that departs from the traditional end-to-end approach. In this framework, each agent is specialized in a distinct aspect, such as foreground understanding, diversity enhancement, object integrity protection, and textual prompt consistency. Our framework is further enhanced with the ability to incorporate optional user textual inputs, perform automated quality assessments, and initiate re-generation as needed. Comprehensive experiments demonstrate that this modular design effectively overcomes the limitations of existing end-to-end models, resulting in higher fidelity, quality, diversity and controllability in foreground-conditioned image generation. Additionally, the Anywhere framework is extensible, allowing it to benefit from future advancements in each individual agent.

arxiv情報

著者	Tianyidan Xie,Rui Ma,Qian Wang,Xiaoqian Ye,Feixuan Liu,Ying Tai,Zhenyu Zhang,Lanjun Wang,Zili Yi
発行日	2025-02-24 15:59:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.GR | コメントを受け付けていません

GaussianFlowOcc: Sparse and Weakly Supervised Occupancy Estimation using Gaussian Splatting and Temporal Flow

投稿日: 2025年2月25日作成者: jarxiv

要約

占有率の推定は、3Dコンピュータービジョン、特に自律運転コミュニティ内で顕著な課題となっています。
この論文では、GaussianFlowoccと呼ばれる占有推定への新しいアプローチを紹介します。これは、ガウスのスプラッティングに触発され、従来の濃いボクセルグリッドをまばらな3Dガウス表現に置き換えます。
ガウス変圧器に基づく当社の効率的なモデルアーキテクチャは、主に空の3Dスペースを主に表現する非効率的なボクセルベースの表現で使用される高価な3D畳み込みの必要性を排除することにより、計算およびメモリの要件を大幅に削減します。
Gaussianflowoccは、ネットワークトレーニングプロセス全体で各ガウスの時間的流れを推定することにより、シーンのダイナミクスを効果的にキャプチャし、既存の方法でしばしば無視される複雑な問題に対する簡単な解決策を提供します。
さらに、GaussianFlowoccは、監督が弱く、追加データ（LIDARなど）に基づいて費用のかかる密度の高い3Dボクセル注釈を必要としないため、スケーラビリティのために設計されています。
広範な実験を通じて、Gaussianflowoccは、ヌスセンデータセットのnused延した占有率の推定のための以前のすべての方法を大幅に上回ると同時に、現在のソタの50倍高速な推論速度を特徴とすることを実証します。

要約(オリジナル)

Occupancy estimation has become a prominent task in 3D computer vision, particularly within the autonomous driving community. In this paper, we present a novel approach to occupancy estimation, termed GaussianFlowOcc, which is inspired by Gaussian Splatting and replaces traditional dense voxel grids with a sparse 3D Gaussian representation. Our efficient model architecture based on a Gaussian Transformer significantly reduces computational and memory requirements by eliminating the need for expensive 3D convolutions used with inefficient voxel-based representations that predominantly represent empty 3D spaces. GaussianFlowOcc effectively captures scene dynamics by estimating temporal flow for each Gaussian during the overall network training process, offering a straightforward solution to a complex problem that is often neglected by existing methods. Moreover, GaussianFlowOcc is designed for scalability, as it employs weak supervision and does not require costly dense 3D voxel annotations based on additional data (e.g., LiDAR). Through extensive experimentation, we demonstrate that GaussianFlowOcc significantly outperforms all previous methods for weakly supervised occupancy estimation on the nuScenes dataset while featuring an inference speed that is 50 times faster than current SOTA.

arxiv情報

著者	Simon Boeder,Fabian Gigengack,Benjamin Risse
発行日	2025-02-24 16:16:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

A novel approach to navigate the taxonomic hierarchy to address the Open-World Scenarios in Medicinal Plant Classification

投稿日: 2025年2月25日作成者: jarxiv

要約

この記事では、問題をオープンクラスの問題として提起することにより、植物の階層分類分類のための新しいアプローチを提案します。
薬用植物分類のための既存の方法は、しばしば階層分類を実行し、未知の種を正確に識別できず、包括的な植物分類分類における有効性を制限することが観察されています。
したがって、最良の階層ラベルを割り当てることにより、未知の種の分類の問題に対処します。
階層分類のために、Densenet121、マルチスケールの自己attention（MSSA）、およびカスケード分類子を統合する新しい方法を提案します。
このアプローチは、門から種まで、複数の分類学的レベルで薬用植物を体系的に分類し、詳細かつ正確な分類を確保します。
マルチスケール空間の注意を使用して、このモデルは画像からローカルおよびグローバルのコンテキスト情報の両方をキャプチャし、同様の種と新しい種の識別を改善します。
注意スコアを使用して、複数のスケールで重要な機能に焦点を当てます。
提案された方法は、階層分類の解決策を提供し、既知の種と未知の種の両方を識別する上で優れた性能を示します。
このモデルは、バックグラウンドアーティファクトを持つ場合と使用しない2つの最先端のデータセットでテストされ、実際の単語アプリケーションに取り組むために展開できます。
モデルをテストするために未知の種を使用しました。
未知の種の場合、モデルは、それぞれ正しい門、クラス、順序、および家族を予測するために、それぞれ83.36％、78.30％、60.34％、43.32％の平均精度を達成しました。
提案されているモデルサイズは、既存の最先端のメソッドのほぼ4倍少ないため、現実世界のアプリケーションで簡単に展開できます。

要約(オリジナル)

In this article, we propose a novel approach for plant hierarchical taxonomy classification by posing the problem as an open class problem. It is observed that existing methods for medicinal plant classification often fail to perform hierarchical classification and accurately identifying unknown species, limiting their effectiveness in comprehensive plant taxonomy classification. Thus we address the problem of unknown species classification by assigning it best hierarchical labels. We propose a novel method, which integrates DenseNet121, Multi-Scale Self-Attention (MSSA) and cascaded classifiers for hierarchical classification. The approach systematically categorizes medicinal plants at multiple taxonomic levels, from phylum to species, ensuring detailed and precise classification. Using multi scale space attention, the model captures both local and global contextual information from the images, improving the distinction between similar species and the identification of new ones. It uses attention scores to focus on important features across multiple scales. The proposed method provides a solution for hierarchical classification, showcasing superior performance in identifying both known and unknown species. The model was tested on two state-of-art datasets with and without background artifacts and so that it can be deployed to tackle real word application. We used unknown species for testing our model. For unknown species the model achieved an average accuracy of 83.36%, 78.30%, 60.34% and 43.32% for predicting correct phylum, class, order and family respectively. Our proposed model size is almost four times less than the existing state of the art methods making it easily deploy able in real world application.

arxiv情報

著者	Soumen Sinha,Tanisha Rana,Rahul Roy
発行日	2025-02-24 16:20:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント