「cs.CL」カテゴリーアーカイブ

CultureBank: An Online Community-Driven Knowledge Base Towards Culturally Aware Language Technologies

投稿日: 2024年4月24日作成者: jarxiv

要約言語モデルの文化的認識を強化するために、さまざまなオンラインコミュニティ … 続きを読む →

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts

投稿日: 2024年4月24日作成者: jarxiv

要約アップサイクルされた専門家混合 (MoE) をマージするだけで、命令調整さ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG, cs.SE | コメントを受け付けていません

Aligning LLM Agents by Learning Latent Preference from User Edits

投稿日: 2024年4月24日作成者: jarxiv

要約私たちは、エージェントの出力に対して行われたユーザー編集に基づいて、言語エ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.IR, cs.LG | コメントを受け付けていません

Visual Grounding Methods for VQA are Working for the Wrong Reasons!

投稿日: 2024年4月24日作成者: jarxiv

要約既存のビジュアル質問応答 (VQA) 手法は、正しい理由から正しい答えを生 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Subobject-level Image Tokenization

投稿日: 2024年4月24日作成者: jarxiv

要約トランスフォーマーベースのビジョンモデルは通常、画像を入力単位として固定サ … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning

投稿日: 2024年4月24日作成者: jarxiv

要約大規模なビジョン言語モデルの急速な進歩により、さまざまなタスクにわたって驚 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Re-Thinking Inverse Graphics With Large Language Models

投稿日: 2024年4月24日作成者: jarxiv

要約逆グラフィックス (画像を物理変数に反転し、レンダリング時に観察されたシー … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

VideoXum: Cross-modal Visual and Textural Summarization of Videos

投稿日: 2024年4月24日作成者: jarxiv

要約ビデオの要約は、ソースビデオから最も重要な情報を抽出して、要約されたクリッ … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models

投稿日: 2024年4月24日作成者: jarxiv

要約命令追従モデルの最近の進歩により、ユーザーとモデルの対話がよりユーザーフレ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans and Radiology Reports for Full-Body Scenarios

投稿日: 2024年4月24日作成者: jarxiv

要約 Medical Vision-Language Pretraining ( … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

「cs.CL」カテゴリーアーカイブ

CultureBank: An Online Community-Driven Knowledge Base Towards Culturally Aware Language Technologies

XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts

Aligning LLM Agents by Learning Latent Preference from User Edits

Visual Grounding Methods for VQA are Working for the Wrong Reasons!

Subobject-level Image Tokenization

MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning

Re-Thinking Inverse Graphics With Large Language Models

VideoXum: Cross-modal Visual and Textural Summarization of Videos

Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models

CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans and Radiology Reports for Full-Body Scenarios

最近の投稿

最近のコメント

アーカイブ

カテゴリー