月別アーカイブ: 2025年2月

Evaluation of End-to-End Continuous Spanish Lipreading in Different Data Conditions

投稿日: 2025年2月18日作成者: jarxiv

要約視覚的な音声認識は、視覚的なあいまいさ、スピーカー間の人間間変動、沈黙の複 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Steering the LoCoMotif: Using Domain Knowledge in Time Series Motif Discovery

投稿日: 2025年2月18日作成者: jarxiv

要約時系列モチーフディスカバリー（TSMD）は、時系列データの繰り返しパターン … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Rethinking Audio-Visual Adversarial Vulnerability from Temporal and Modality Perspectives

投稿日: 2025年2月18日作成者: jarxiv

要約視聴覚学習は、複数の感覚モダリティを活用することにより、現実の世界をより豊 … 続きを読む →

カテゴリー: cs.CV, cs.SD | コメントを受け付けていません

Defining and Evaluating Visual Language Models’ Basic Spatial Abilities: A Perspective from Psychometrics

投稿日: 2025年2月18日作成者: jarxiv

要約複数のインテリジェンスの理論は、認知能力の階層的な性質を強調しています。 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Does Knowledge About Perceptual Uncertainty Help an Agent in Automated Driving?

投稿日: 2025年2月18日作成者: jarxiv

要約自動化された運転のような現実世界のシナリオのエージェントは、特に知覚的な不 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Understanding Long Videos with Multimodal Language Models

投稿日: 2025年2月18日作成者: jarxiv

要約大規模な言語モデル（LLM）により、最近のLLMベースのアプローチが可能に … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

iFormer: Integrating ConvNet and Transformer for Mobile Application

投稿日: 2025年2月18日作成者: jarxiv

要約 Iformerと呼ばれるモバイルハイブリッドビジョンネットワークの新しいフ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

From Open-Vocabulary to Vocabulary-Free Semantic Segmentation

投稿日: 2025年2月18日作成者: jarxiv

要約オープンボキャブラリーセマンティックセグメンテーションにより、モデルはトレ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation

投稿日: 2025年2月18日作成者: jarxiv

要約この論文では、潜在空間で適応的な時間的圧縮を利用できるトレーニングなしのパ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Bridging Compressed Image Latents and Multimodal Large Language Models

投稿日: 2025年2月18日作成者: jarxiv

要約このホワイトペーパーでは、マルチモーダルの大手言語モデル（MLLM）を採用 … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.MM | コメントを受け付けていません

月別アーカイブ: 2025年2月

Evaluation of End-to-End Continuous Spanish Lipreading in Different Data Conditions

Steering the LoCoMotif: Using Domain Knowledge in Time Series Motif Discovery

Rethinking Audio-Visual Adversarial Vulnerability from Temporal and Modality Perspectives

Defining and Evaluating Visual Language Models’ Basic Spatial Abilities: A Perspective from Psychometrics

Does Knowledge About Perceptual Uncertainty Help an Agent in Automated Driving?

Understanding Long Videos with Multimodal Language Models

iFormer: Integrating ConvNet and Transformer for Mobile Application

From Open-Vocabulary to Vocabulary-Free Semantic Segmentation

DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation

Bridging Compressed Image Latents and Multimodal Large Language Models

最近の投稿

最近のコメント

アーカイブ

カテゴリー