「cs.CV」カテゴリーアーカイブ

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

投稿日: 2024年10月15日作成者: jarxiv

要約マルチモーダルビデオの理解と生成には、きめの細かい時間ダイナミクスを理解す … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

When Does Perceptual Alignment Benefit Vision Representations?

投稿日: 2024年10月15日作成者: jarxiv

要約人間は、シーンのレイアウト、被写体の位置、カメラのポーズなどのさまざまな視 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models

投稿日: 2024年10月15日作成者: jarxiv

要約 3D メッシュは、アニメーションの効率性とメモリ使用量を最小限に抑えるため … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

LIME-Eval: Rethinking Low-light Image Enhancement Evaluation via Object Detection

投稿日: 2024年10月15日作成者: jarxiv

要約強調の性質、つまり対になったグラウンドトゥルース情報が存在しないため、最近 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility

投稿日: 2024年10月14日作成者: jarxiv

要約街並みや広場などの公共の都市空間は、住民にサービスを提供し、あらゆる活気に … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.RO | コメントを受け付けていません

SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction

投稿日: 2024年10月14日作成者: jarxiv

要約自動運転車 (AV) が動的で人間とロボットが混在する環境で安全に動作する … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.RO | コメントを受け付けていません

ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections

投稿日: 2024年10月14日作成者: jarxiv

要約パラメーター効率の良い微調整 (PEFT) は、一般化機能を維持しながら基 … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG | コメントを受け付けていません

KnobGen: Controlling the Sophistication of Artwork in Sketch-Based Diffusion Models

投稿日: 2024年10月14日作成者: jarxiv

要約拡散モデルの最近の進歩により、テキストから画像への (T2I) 生成が大幅 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

HpEIS: Learning Hand Pose Embeddings for Multimedia Interactive Systems

投稿日: 2024年10月14日作成者: jarxiv

要約我々は、仮想センサーとして新しいハンドポーズエンベディングインタラクティブ … 続きを読む →

カテゴリー: cs.CV, cs.HC | コメントを受け付けていません

VideoSAM: Open-World Video Segmentation

投稿日: 2024年10月14日作成者: jarxiv

要約ビデオセグメンテーションは、ロボット工学と自動運転の進歩に不可欠であり、 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

When Does Perceptual Alignment Benefit Vision Representations?

Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models

LIME-Eval: Rethinking Low-light Image Enhancement Evaluation via Object Detection

MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility

SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction

ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections

KnobGen: Controlling the Sophistication of Artwork in Sketch-Based Diffusion Models

HpEIS: Learning Hand Pose Embeddings for Multimedia Interactive Systems

VideoSAM: Open-World Video Segmentation

最近の投稿

最近のコメント

アーカイブ

カテゴリー