「cs.CV」カテゴリーアーカイブ

GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration

投稿日: 2024年8月20日作成者: jarxiv

要約ロボット操作のワンショット視覚教育を容易にするために、汎用ビジョン言語モデ … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.RO | コメントを受け付けていません

SceneMotion: From Agent-Centric Embeddings to Scene-Wide Forecasts

投稿日: 2024年8月20日作成者: jarxiv

要約自動運転車は、環境と効果的に対話し、安全な操縦を計画するために、マルチモー … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

R2GenCSR: Retrieving Context Samples for Large Language Model based X-ray Medical Report Generation

投稿日: 2024年8月20日作成者: jarxiv

要約大規模言語モデル (LLM) の大きな成功に触発されて、既存の X 線医療 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation

投稿日: 2024年8月20日作成者: jarxiv

要約従来のアニメーション生成方法は、人間がラベル付けしたデータを使用した生成モ … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.MM | コメントを受け付けていません

Docling Technical Report

投稿日: 2024年8月20日作成者: jarxiv

要約この技術レポートでは、PDF ドキュメント変換用の使いやすい自己完結型の … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.SE | コメントを受け付けていません

Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit

投稿日: 2024年8月20日作成者: jarxiv

要約モデル編集は、コストのかかる再トレーニングを行わずに、大規模なモデル内の古 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

RET-CLIP: A Retinal Image Foundation Model Pre-trained with Clinical Diagnostic Reports

投稿日: 2024年8月20日作成者: jarxiv

要約 Vision-Language Foundation モデルは、コンピュー … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Caption-Driven Explorations: Aligning Image and Text Embeddings through Human-Inspired Foveated Vision

投稿日: 2024年8月20日作成者: jarxiv

要約人間の注意を理解することは、視覚科学と AI にとって非常に重要です。自 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

C${^2}$RL: Content and Context Representation Learning for Gloss-free Sign Language Translation and Retrieval

投稿日: 2024年8月20日作成者: jarxiv

要約手話表現学習 (SLRL) は、手話翻訳 (SLT) や手話検索 (SLR … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Text-Conditioned Resampler For Long Form Video Understanding

投稿日: 2024年8月20日作成者: jarxiv

要約この論文では、事前にトレーニングされフリーズされたビジュアルエンコーダー … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration

SceneMotion: From Agent-Centric Embeddings to Scene-Wide Forecasts

R2GenCSR: Retrieving Context Samples for Large Language Model based X-ray Medical Report Generation

Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation

Docling Technical Report

Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit

RET-CLIP: A Retinal Image Foundation Model Pre-trained with Clinical Diagnostic Reports

Caption-Driven Explorations: Aligning Image and Text Embeddings through Human-Inspired Foveated Vision

C${^2}$RL: Content and Context Representation Learning for Gloss-free Sign Language Translation and Retrieval

Text-Conditioned Resampler For Long Form Video Understanding

最近の投稿

最近のコメント

アーカイブ

カテゴリー