月別アーカイブ: 2024年6月

Multimodal Contextualized Semantic Parsing from Speech

投稿日: 2024年6月11日作成者: jarxiv

要約マルチモーダル入力を以前のコンテキストと統合することで人工エージェントのコ … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.HC, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

VCR: Visual Caption Restoration

投稿日: 2024年6月11日作成者: jarxiv

要約私たちは、画像内のピクセルレベルのヒントを使用して、部分的に隠れたテキスト … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction

投稿日: 2024年6月11日作成者: jarxiv

要約テキストガイド付きビデオ予測 (TVP) には、指示に従って最初のフレーム … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG, cs.MM | コメントを受け付けていません

Active Neural 3D Reconstruction with Colorized Surface Voxel-based View Selection

投稿日: 2024年6月11日作成者: jarxiv

要約 3D シーンの再構成におけるアクティブビューの選択は、再構成には有益なビ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Direct Preference Optimization for Suppressing Hallucinated Prior Exams in Radiology Report Generation

投稿日: 2024年6月11日作成者: jarxiv

要約生成視覚言語モデル (VLM) の最近の進歩は、放射線医学における AI … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG | コメントを受け付けていません

NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative

投稿日: 2024年6月11日作成者: jarxiv

要約既存のビデオキャプションベンチマークとモデルには、因果関係を介してリンクさ … 続きを読む →

カテゴリー: cs.CV, cs.HC | コメントを受け付けていません

An unsupervised approach towards promptable defect segmentation in laser-based additive manufacturing by Segment Anything

投稿日: 2024年6月11日作成者: jarxiv

要約財団モデルは現在、生物学、天文学、ロボット工学などのさまざまな分野のコンピ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

BloomVQA: Assessing Hierarchical Multi-modal Comprehension

投稿日: 2024年6月11日作成者: jarxiv

要約我々は、理解タスクに関する大規模な視覚言語モデルの包括的な評価を容易にする … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG | コメントを受け付けていません

Improving Alignment and Robustness with Circuit Breakers

投稿日: 2024年6月11日作成者: jarxiv

要約 AI システムは有害な動作を行う可能性があり、敵対的な攻撃に対して非常に脆 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.CY, cs.LG | コメントを受け付けていません

Monkey See, Monkey Do: Harnessing Self-attention in Motion Diffusion for Zero-shot Motion Transfer

投稿日: 2024年6月11日作成者: jarxiv

要約拡散モデルを使用したモーション合成の顕著な結果を考えると、自然な疑問が生じ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.GR | コメントを受け付けていません

月別アーカイブ: 2024年6月

Multimodal Contextualized Semantic Parsing from Speech

VCR: Visual Caption Restoration

AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction

Active Neural 3D Reconstruction with Colorized Surface Voxel-based View Selection

Direct Preference Optimization for Suppressing Hallucinated Prior Exams in Radiology Report Generation

NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative

An unsupervised approach towards promptable defect segmentation in laser-based additive manufacturing by Segment Anything

BloomVQA: Assessing Hierarchical Multi-modal Comprehension

Improving Alignment and Robustness with Circuit Breakers

Monkey See, Monkey Do: Harnessing Self-attention in Motion Diffusion for Zero-shot Motion Transfer

最近の投稿

最近のコメント

アーカイブ

カテゴリー