「cs.CV」カテゴリーアーカイブ

3D CAVLA: Leveraging Depth and 3D Context to Generalize Vision Language Action Models for Unseen Tasks

投稿日: 2025年5月12日作成者: jarxiv

要約 3Dでのロボット操作には、ロボットマニピュレーターの$ n $ freed … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

RS2AD: End-to-End Autonomous Driving Data Generation from Roadside Sensor Observations

投稿日: 2025年5月12日作成者: jarxiv

要約洗練された制御コマンドを直接生成してマルチモーダル感覚データを処理するエン … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

TAPTRv2: Attention-based Position Update Improves Tracking Any Point

投稿日: 2025年5月12日作成者: jarxiv

要約このホワイトペーパーでは、TAPTRベースのアプローチであるTAPTRV2 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Enhancing Target-unspecific Tasks through a Features Matrix

投稿日: 2025年5月12日作成者: jarxiv

要約大規模なビジョン言語モデルの迅速な学習の最近の開発により、ターゲット固有の … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI

投稿日: 2025年5月12日作成者: jarxiv

要約最新の自動車インフォテインメントシステムには、頻繁なユーザーインターフェイ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Examining the Source of Defects from a Mechanical Perspective for 3D Anomaly Detection

投稿日: 2025年5月12日作成者: jarxiv

要約この論文では、構造用語でのみ異常を特定するだけでなく、異常の原因によって動 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Mixed Text Recognition with Efficient Parameter Fine-Tuning and Transformer

投稿日: 2025年5月12日作成者: jarxiv

要約 OCRテクノロジーの急速な発展に伴い、混合シーンテキスト認識が重要な技術的 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

DFEN: Dual Feature Equalization Network for Medical Image Segmentation

投稿日: 2025年5月12日作成者: jarxiv

要約医療画像セグメンテーションの現在の方法は、主に画像全体の観点からコンテキス … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Visualization of a multidimensional point cloud as a 3D swarm of avatars

投稿日: 2025年5月12日作成者: jarxiv

要約この記事では、Chernoff Facesに触発されたアイコンを使用して、 … 続きを読む →

カテゴリー: cs.CV, cs.HC | コメントを受け付けていません

From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D

投稿日: 2025年5月12日作成者: jarxiv

要約 LVLMSの最近の進歩により、視覚言語の理解が向上しましたが、彼らはまだ空 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

3D CAVLA: Leveraging Depth and 3D Context to Generalize Vision Language Action Models for Unseen Tasks

RS2AD: End-to-End Autonomous Driving Data Generation from Roadside Sensor Observations

TAPTRv2: Attention-based Position Update Improves Tracking Any Point

Enhancing Target-unspecific Tasks through a Features Matrix

Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI

Examining the Source of Defects from a Mechanical Perspective for 3D Anomaly Detection

Mixed Text Recognition with Efficient Parameter Fine-Tuning and Transformer

DFEN: Dual Feature Equalization Network for Medical Image Segmentation

Visualization of a multidimensional point cloud as a 3D swarm of avatars

From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D

最近の投稿

最近のコメント

アーカイブ

カテゴリー