月別アーカイブ: 2025年1月

Mitigating Hallucination for Large Vision Language Model by Inter-Modality Correlation Calibration Decoding

投稿日: 2025年1月6日作成者: jarxiv

要約大規模視覚言語モデル(LVLM)は、下流のマルチモーダルタスクの視覚言語理 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Bridging Classification and Segmentation in Osteosarcoma Assessment via Foundation and Discrete Diffusion Models

投稿日: 2025年1月6日作成者: jarxiv

要約最も一般的な原発性骨癌である骨肉腫では、効果的な治療計画と予後のために、し … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment

投稿日: 2025年1月6日作成者: jarxiv

要約単眼映像から正確な3Dモデルを効率的に再構成することは、コンピュータビジョ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

InvSeg: Test-Time Prompt Inversion for Semantic Segmentation

投稿日: 2025年1月6日作成者: jarxiv

要約テキスト-画像拡散モデルから得られる注意マップにおける視覚-テキスト相関は … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

投稿日: 2025年1月6日作成者: jarxiv

要約近年のマルチモーダル大規模言語モデル(MLLM)は、通常、視覚とテキストモ … 続きを読む →

カテゴリー: cs.CV, cs.SD, eess.AS | コメントを受け付けていません

Simultaneous Latent State Estimation and Latent Linear Dynamics Discovery from Image Observations

投稿日: 2025年1月6日作成者: jarxiv

要約状態推定問題には長い歴史があり，ノイズの多い観測値が与えられた場合に事後フ … 続きを読む →

カテゴリー: cs.LG | コメントを受け付けていません

CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

投稿日: 2025年1月6日作成者: jarxiv

要約既存の大規模言語モデル（LLM）のコード推論能力が向上し、OpenAI o … 続きを読む →

カテゴリー: cs.CL | コメントを受け付けていません

MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization

投稿日: 2025年1月6日作成者: jarxiv

要約近年、音楽タグ付け、楽器分類、キー検出など、様々な音楽インフォマティクス理 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Speech Retrieval-Augmented Generation without Automatic Speech Recognition

投稿日: 2025年1月6日作成者: jarxiv

要約音声データに対する質問応答の一般的なアプローチの1つは、まず自動音声認識（ … 続きを読む →

カテゴリー: cs.AI, cs.CL, eess.AS | コメントを受け付けていません

BlockDialect: Block-wise Fine-grained Mixed Format for Energy-Efficient LLM Inference

投稿日: 2025年1月6日作成者: jarxiv

要約大規模言語モデル(LLM)は目覚ましい成功を収めているが、サイズが大きくな … 続きを読む →

カテゴリー: cs.CL, cs.LG | コメントを受け付けていません

月別アーカイブ: 2025年1月

Mitigating Hallucination for Large Vision Language Model by Inter-Modality Correlation Calibration Decoding

Bridging Classification and Segmentation in Osteosarcoma Assessment via Foundation and Discrete Diffusion Models

VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment

InvSeg: Test-Time Prompt Inversion for Semantic Segmentation

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Simultaneous Latent State Estimation and Latent Linear Dynamics Discovery from Image Observations

CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization

Speech Retrieval-Augmented Generation without Automatic Speech Recognition

BlockDialect: Block-wise Fine-grained Mixed Format for Energy-Efficient LLM Inference

最近の投稿

最近のコメント

アーカイブ

カテゴリー