月別アーカイブ: 2024年2月

GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation

投稿日: 2024年2月21日作成者: jarxiv

要約文書内のオブジェクト検出は、階層構造とさまざまな要素間の関係を理解すること … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

VideoPrism: A Foundational Visual Encoder for Video Understanding

投稿日: 2024年2月21日作成者: jarxiv

要約単一のフリーズされたモデルで多様なビデオ理解タスクに取り組む汎用ビデオエ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts

投稿日: 2024年2月21日作成者: jarxiv

要約マルチモーダル大規模言語モデル (MLLM) の目覚ましい進歩によっても、 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Acquiring Weak Annotations for Tumor Localization in Temporal and Volumetric Data

投稿日: 2024年2月21日作成者: jarxiv

要約 AI アルゴリズムをトレーニングするための大規模で十分に注釈が付けられたデ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

A Touch, Vision, and Language Dataset for Multimodal Alignment

投稿日: 2024年2月21日作成者: jarxiv

要約接触は人間にとって重要な感覚様式ですが、マルチモーダルな生成言語モデルには … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

AnoMalNet: Outlier Detection based Malaria Cell Image Classification Method Leveraging Deep Autoencoder

投稿日: 2024年2月21日作成者: jarxiv

要約クラスの不均衡は、医療画像からの疾患分類の分野で広く見られる問題です。適 … 続きを読む →

カテゴリー: cs.CV, eess.IV | コメントを受け付けていません

VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning

投稿日: 2024年2月21日作成者: jarxiv

要約大規模な運転デモンストレーションから人間らしい運転方針を学ぶことは有望です … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Video ReCap: Recursive Captioning of Hour-Long Videos

投稿日: 2024年2月21日作成者: jarxiv

要約ほとんどのビデオキャプションモデルは、数秒の短いビデオクリップを処理し、低 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

FlashTex: Fast Relightable Mesh Texturing with LightControlNet

投稿日: 2024年2月21日作成者: jarxiv

要約 3D メッシュのテクスチャを手動で作成するのは、熟練したビジュアルコンテ … 続きを読む →

カテゴリー: cs.CV, cs.GR, cs.LG | コメントを受け付けていません

Improving Robustness for Joint Optimization of Camera Poses and Decomposed Low-Rank Tensorial Radiance Fields

投稿日: 2024年2月21日作成者: jarxiv

要約この論文では、監視として 2D 画像のみを使用して、分解された低ランクテ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

月別アーカイブ: 2024年2月

GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation

VideoPrism: A Foundational Visual Encoder for Video Understanding

How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts

Acquiring Weak Annotations for Tumor Localization in Temporal and Volumetric Data

A Touch, Vision, and Language Dataset for Multimodal Alignment

AnoMalNet: Outlier Detection based Malaria Cell Image Classification Method Leveraging Deep Autoencoder

VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning

Video ReCap: Recursive Captioning of Hour-Long Videos

FlashTex: Fast Relightable Mesh Texturing with LightControlNet

Improving Robustness for Joint Optimization of Camera Poses and Decomposed Low-Rank Tensorial Radiance Fields

最近の投稿

最近のコメント

アーカイブ

カテゴリー