「cs.CV」カテゴリーアーカイブ

Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding

投稿日: 2025年3月28日作成者: jarxiv

要約マルチモーダル大手言語モデル（MLLMS）の急速な進歩は、さまざまなマルチ … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Keyword-Oriented Multimodal Modeling for Euphemism Identification

投稿日: 2025年3月28日作成者: jarxiv

要約陶酔感の識別は、「雑草」（e曲表現）を「マリファナ」（ターゲットキーワード … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving

投稿日: 2025年3月28日作成者: jarxiv

要約自律運転（AD）に関するビジョン言語モデル（VLM）の既存のベンチマーク（ … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing

投稿日: 2025年3月28日作成者: jarxiv

要約テキスト誘導画像編集は、一般的な構造と背景の忠実度を維持しながら、自然言語 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

TREAD: Token Routing for Efficient Architecture-agnostic Diffusion Training

投稿日: 2025年3月28日作成者: jarxiv

要約拡散モデルは、視覚生成の主流のアプローチとして浮上しています。ただし、こ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

AlignDiff: Learning Physically-Grounded Camera Alignment via Diffusion

投稿日: 2025年3月28日作成者: jarxiv

要約正確なカメラのキャリブレーションは、特に複雑な光学歪みが一般的である現実世 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures

投稿日: 2025年3月28日作成者: jarxiv

要約外科的コンピュータービジョンアプリケーションの最近の進歩は、視覚のみのモデ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers

投稿日: 2025年3月28日作成者: jarxiv

要約拡散変圧器（DITS）は、最先端の（SOTA）画像生成の品質を達成しました … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

When Astronomy Meets AI: Manazel For Crescent Visibility Prediction in Morocco

投稿日: 2025年3月28日作成者: jarxiv

要約各ヒジュリ月の初めの正確な決定は、宗教的、文化的、および行政目的に不可欠で … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Self-Contrastive Forward-Forward Algorithm

投稿日: 2025年3月28日作成者: jarxiv

要約自律的に動作するエージェントは、生涯学習能力の恩恵を受けます。ただし、互 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.ET, cs.LG, cs.NE | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding

Keyword-Oriented Multimodal Modeling for Euphemism Identification

Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving

LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing

TREAD: Token Routing for Efficient Architecture-agnostic Diffusion Training

AlignDiff: Learning Physically-Grounded Camera Alignment via Diffusion

Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures

Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers

When Astronomy Meets AI: Manazel For Crescent Visibility Prediction in Morocco

Self-Contrastive Forward-Forward Algorithm

最近の投稿

最近のコメント

アーカイブ

カテゴリー