「cs.CV」カテゴリーアーカイブ

Optimal Stepsize for Diffusion Sampling

投稿日: 2025年3月28日作成者: jarxiv

要約拡散モデルは顕著な生成品質を達成しますが、最適ではないステップ離散化により … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Video-R1: Reinforcing Video Reasoning in MLLMs

投稿日: 2025年3月28日作成者: jarxiv

要約ルールベースの強化学習（RL）を通じて推論能力を引き出すことにおけるDee … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Test-Time Visual In-Context Tuning

投稿日: 2025年3月28日作成者: jarxiv

要約視覚的なコンテキスト学習（VICL）は、コンピュータービジョンの新しいパラ … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

HS-SLAM: Hybrid Representation with Structural Supervision for Improved Dense SLAM

投稿日: 2025年3月28日作成者: jarxiv

要約 NERFベースのSLAMは最近、追跡と再建において有望な結果を達成しました … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Do Multimodal Large Language Models See Like Humans?

投稿日: 2025年3月28日作成者: jarxiv

要約マルチモーダル大手言語モデル（MLLM）は、さまざまなビジョンタスクで印象 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

X$^{2}$-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction

投稿日: 2025年3月28日作成者: jarxiv

要約 4次元コンピューター断層撮影（4D CT）再構築は、動的な解剖学的変化をキ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model

投稿日: 2025年3月28日作成者: jarxiv

要約ビデオ理解モデルは、多くの場合、高い計算要件、広範なパラメーターカウント、 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models

投稿日: 2025年3月28日作成者: jarxiv

要約カスタマイズされたテキストからビデオへの生成は、ユーザーが指定したサブジェ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation

投稿日: 2025年3月28日作成者: jarxiv

要約オープンボキャブラリーセマンティックセグメンテーションモデルは、テキストク … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Imitating Radiological Scrolling: A Global-Local Attention Model for 3D Chest CT Volumes Multi-Label Anomaly Classification

投稿日: 2025年3月28日作成者: jarxiv

要約コンピューター断層撮影（CT）スキャン検査の数の急速な増加は、放射線科医が … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Optimal Stepsize for Diffusion Sampling

Video-R1: Reinforcing Video Reasoning in MLLMs

Test-Time Visual In-Context Tuning

HS-SLAM: Hybrid Representation with Structural Supervision for Improved Dense SLAM

Do Multimodal Large Language Models See Like Humans?

X$^{2}$-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction

Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model

VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models

Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation

Imitating Radiological Scrolling: A Global-Local Attention Model for 3D Chest CT Volumes Multi-Label Anomaly Classification

最近の投稿

最近のコメント

アーカイブ

カテゴリー