「cs.CV」カテゴリーアーカイブ

HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation

投稿日: 2025年2月18日作成者: jarxiv

要約自己回帰パラダイムの顕著な成功により、Multimodal大言語モデル（M … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

VoLUT: Efficient Volumetric streaming enhanced by LUT-based super-resolution

投稿日: 2025年2月18日作成者: jarxiv

要約 3Dボリュームビデオは、没入型の体験を提供し、デジタルメディアで牽引力を獲 … 続きを読む →

カテゴリー: cs.CV, cs.SY, eess.SY | コメントを受け付けていません

Diffusion Models without Classifier-free Guidance

投稿日: 2025年2月18日作成者: jarxiv

要約このホワイトペーパーでは、一般的に使用される分類器のないガイダンス（CFG … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

3D Whole-body Grasp Synthesis with Directional Controllability

投稿日: 2025年2月18日作成者: jarxiv

要約オブジェクトを現実的に把握する3D全体の体を合成することは、アニメーション … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

投稿日: 2025年2月18日作成者: jarxiv

要約 30Bパラメーターと最大204フレームの長さまでのビデオを生成する機能を備 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Compress image to patches for Vision Transformer

投稿日: 2025年2月18日作成者: jarxiv

要約ビジョントランス（VIT）は、コンピュータービジョンの分野で大きな進歩を遂 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Magic 1-For-1: Generating One Minute Video Clips within One Minute

投稿日: 2025年2月18日作成者: jarxiv

要約このテクニカルレポートでは、最適化されたメモリ消費と推論潜時を備えた効率的 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Vision-based Geo-Localization of Future Mars Rotorcraft in Challenging Illumination Conditions

投稿日: 2025年2月17日作成者: jarxiv

要約航空資産を使用した惑星探査は、火星に関する前例のない科学的発見の可能性があ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

PUGS: Perceptual Uncertainty for Grasp Selection in Underwater Environments

投稿日: 2025年2月17日作成者: jarxiv

要約感覚情報が不完全で不完全な挑戦的な環境でナビゲートおよび相互作用する場合、 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models

投稿日: 2025年2月17日作成者: jarxiv

要約現在の自律運転車両は、主に個々のセンサーに依存して、周囲のシーンを理解し、 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation

VoLUT: Efficient Volumetric streaming enhanced by LUT-based super-resolution

Diffusion Models without Classifier-free Guidance

3D Whole-body Grasp Synthesis with Directional Controllability

Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

Compress image to patches for Vision Transformer

Magic 1-For-1: Generating One Minute Video Clips within One Minute

Vision-based Geo-Localization of Future Mars Rotorcraft in Challenging Illumination Conditions

PUGS: Perceptual Uncertainty for Grasp Selection in Underwater Environments

V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models

最近の投稿

最近のコメント

アーカイブ

カテゴリー