「cs.CV」カテゴリーアーカイブ

VITA: Towards Open-Source Interactive Omni Multimodal LLM

投稿日: 2024年8月12日作成者: jarxiv

要約 GPT-4o の優れたマルチモーダル機能とインタラクティブなエクスペリエン … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models

投稿日: 2024年8月12日作成者: jarxiv

要約高性能マルチモーダル大規模言語モデル (MLLM) は、データ品質に大きく … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

GMISeg: General Medical Image Segmentation without Re-Training

投稿日: 2024年8月12日作成者: jarxiv

要約オンラインショッピングの行動には、豊富な粒度の次元とデータの疎性という特 … 続きを読む →

カテゴリー: cs.CV, eess.IV | コメントを受け付けていません

MS-Twins: Multi-Scale Deep Self-Attention Networks for Medical Image Segmentation

投稿日: 2024年8月12日作成者: jarxiv

要約胸部 X 線検査は、胸部疾患を診断するための最も一般的な放射線検査の 1 … 続きを読む →

カテゴリー: cs.CV, eess.IV | コメントを受け付けていません

SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World

投稿日: 2024年8月9日作成者: jarxiv

要約高密度の報酬を伴う強化学習 (RL) と人間が生成した軌道を伴う模倣学習 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.RO | コメントを受け付けていません

Grasping Trajectory Optimization with Point Clouds

投稿日: 2024年8月9日作成者: jarxiv

要約ロボットと作業空間の点群表現に基づいたロボットによる把持のための新しい軌道 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Edit As You Wish: Video Caption Editing with Multi-grained User Control

投稿日: 2024年8月9日作成者: jarxiv

要約ユーザーのリクエストに応じて自然言語でビデオを自動的にナレーションすること … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

Enhancing Journalism with AI: A Study of Contextualized Image Captioning for News Articles using LLMs and LMMs

投稿日: 2024年8月9日作成者: jarxiv

要約大規模言語モデル (LLM) と大規模マルチモーダルモデル (LMM) … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

HARMamba: Efficient and Lightweight Wearable Sensor Human Activity Recognition Based on Bidirectional Mamba

投稿日: 2024年8月9日作成者: jarxiv

要約ウェアラブルセンサーベースの人間活動認識 (HAR) は、活動認識にお … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Fast and Accurate Object Detection on Asymmetrical Receptive Field

投稿日: 2024年8月9日作成者: jarxiv

要約物体検出は幅広い業界で使用されています。たとえば、自動運転における物体検 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

VITA: Towards Open-Source Interactive Omni Multimodal LLM

Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models

GMISeg: General Medical Image Segmentation without Re-Training

MS-Twins: Multi-Scale Deep Self-Attention Networks for Medical Image Segmentation

SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World

Grasping Trajectory Optimization with Point Clouds

Edit As You Wish: Video Caption Editing with Multi-grained User Control

Enhancing Journalism with AI: A Study of Contextualized Image Captioning for News Articles using LLMs and LMMs

HARMamba: Efficient and Lightweight Wearable Sensor Human Activity Recognition Based on Bidirectional Mamba

Fast and Accurate Object Detection on Asymmetrical Receptive Field

最近の投稿

最近のコメント

アーカイブ

カテゴリー