「cs.MM」カテゴリーアーカイブ

Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding

投稿日: 2024年4月24日作成者: jarxiv

要約 Large Vision-Language Model (LVLM) は物 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG, cs.MM | コメントを受け付けていません

Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation

投稿日: 2024年4月24日作成者: jarxiv

要約最近の研究では、人間の嗜好データセットを活用してテキストから画像への生成モ … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation

投稿日: 2024年4月24日作成者: jarxiv

要約単眼の 3D 人間の形状と姿勢を推定するための既存の Transforme … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.GR, cs.LG, cs.MM | コメントを受け付けていません

TAVGBench: Benchmarking Text to Audible-Video Generation

投稿日: 2024年4月23日作成者: jarxiv

要約 Text to Audible-Video Generation (TAV … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model

投稿日: 2024年4月22日作成者: jarxiv

要約 LiDAR ベースの移動物体セグメンテーション (MOS) は、以前のスキ … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.RO, eess.IV | コメントを受け付けていません

Leveraging Automatic Personalised Nutrition: Food Image Recognition Benchmark and Dataset based on Nutrition Taxonomy

投稿日: 2024年4月22日作成者: jarxiv

要約不適切な食生活が特徴的な今日の座りっぱなしの社会では、健康的なライフスタイ … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

Training-and-prompt-free General Painterly Harmonization Using Image-wise Attention Sharing

投稿日: 2024年4月22日作成者: jarxiv

要約絵画的なイメージの調和は、単一の一貫したイメージ内で異種の視覚要素をシーム … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM | コメントを受け付けていません

Food Portion Estimation via 3D Object Scaling

投稿日: 2024年4月19日作成者: jarxiv

要約食品画像を分析するための画像ベースの方法により、従来の方法に伴うユーザーの … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.MM, eess.IV | コメントを受け付けていません

Can We Edit Multimodal Large Language Models?

投稿日: 2024年4月19日作成者: jarxiv

要約このペーパーでは、マルチモーダル大規模言語モデル (MLLM) の編集に焦 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG, cs.MM | コメントを受け付けていません

A Perspective on Deep Vision Performance with Standard Image and Video Codecs

投稿日: 2024年4月19日作成者: jarxiv

要約エッジデバイスや携帯電話など、リソースに制約のあるハードウェアは、ディー … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

「cs.MM」カテゴリーアーカイブ

Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding

Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation

SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation

TAVGBench: Benchmarking Text to Audible-Video Generation

MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model

Leveraging Automatic Personalised Nutrition: Food Image Recognition Benchmark and Dataset based on Nutrition Taxonomy

Training-and-prompt-free General Painterly Harmonization Using Image-wise Attention Sharing

Food Portion Estimation via 3D Object Scaling

Can We Edit Multimodal Large Language Models?

A Perspective on Deep Vision Performance with Standard Image and Video Codecs

最近の投稿

最近のコメント

アーカイブ

カテゴリー