「cs.AI」カテゴリーアーカイブ

Order-aware Interactive Segmentation

投稿日: 2024年10月18日作成者: jarxiv

要約インタラクティブセグメンテーションは、最小限のユーザー操作でターゲット … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Movie Gen: A Cast of Media Foundation Models

投稿日: 2024年10月18日作成者: jarxiv

要約さまざまなアスペクト比と同期されたオーディオを備えた高品質の 1080p … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, eess.IV | コメントを受け付けていません

DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation

投稿日: 2024年10月18日作成者: jarxiv

要約トーキングヘッドの生成は、1 つのポートレートとスピーチオーディオク … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Corrective Machine Unlearning

投稿日: 2024年10月18日作成者: jarxiv

要約機械学習モデルは、インターネットから取得した大規模なトレーニングデータセ … 続きを読む →

カテゴリー: cs.AI, cs.CR, cs.CV, cs.LG | コメントを受け付けていません

Multi-style conversion for semantic segmentation of lesions in fundus images by adversarial attacks

投稿日: 2024年10月18日作成者: jarxiv

要約眼底画像に依存する糖尿病性網膜症の診断は、包括的な分類アプローチを使用する … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Unearthing Skill-Level Insights for Understanding Trade-Offs of Foundation Models

投稿日: 2024年10月18日作成者: jarxiv

要約モデルが強化されるにつれて、評価はより複雑になり、1 つのベンチマークで、 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

投稿日: 2024年10月18日作成者: jarxiv

要約この論文では、マルチモーダルな理解と生成を統合する自己回帰フレームワークで … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Retrospective Learning from Interactions

投稿日: 2024年10月18日作成者: jarxiv

要約大規模言語モデル (LLM) とユーザーの間の複数ターンの対話には、当然、 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

Can MLLMs Understand the Deep Implication Behind Chinese Images?

投稿日: 2024年10月18日作成者: jarxiv

要約マルチモーダル大規模言語モデル (MLLM) の機能が向上し続けるにつれて … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.CY | コメントを受け付けていません

Automatic Mapping of Anatomical Landmarks from Free-Text Using Large Language Models: Insights from Llama-2

投稿日: 2024年10月18日作成者: jarxiv

要約解剖学的ランドマークは、ナビゲーションや異常検出のための医療画像処理におい … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

「cs.AI」カテゴリーアーカイブ

Order-aware Interactive Segmentation

Movie Gen: A Cast of Media Foundation Models

DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation

Corrective Machine Unlearning

Multi-style conversion for semantic segmentation of lesions in fundus images by adversarial attacks

Unearthing Skill-Level Insights for Understanding Trade-Offs of Foundation Models

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

Retrospective Learning from Interactions

Can MLLMs Understand the Deep Implication Behind Chinese Images?

Automatic Mapping of Anatomical Landmarks from Free-Text Using Large Language Models: Insights from Llama-2

最近の投稿

最近のコメント

アーカイブ

カテゴリー