投稿者「jarxiv」のアーカイブ

Steering CLIP’s vision transformer with sparse autoencoders

投稿日: 2025年4月14日作成者: jarxiv

要約ビジョンモデルは非常に有能ですが、内部メカニズムはよく理解されていません。 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation

投稿日: 2025年4月14日作成者: jarxiv

要約自己回帰（AR）画像生成では、視覚トークンザーは画像をコンパクトな離散潜在 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

ASHiTA: Automatic Scene-grounded HIerarchical Task Analysis

投稿日: 2025年4月14日作成者: jarxiv

要約シーンの再構築と理解の最近の作業は、物理的な3D環境に自然言語を接地するこ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Do LLMs Understand Your Translations? Evaluating Paragraph-level MT with Question Answering

投稿日: 2025年4月14日作成者: jarxiv

要約機械翻訳評価の着実な進歩にもかかわらず、既存の自動メトリックは、文の境界を … 続きを読む →

カテゴリー: cs.CL, cs.LG | コメントを受け付けていません

Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs

投稿日: 2025年4月14日作成者: jarxiv

要約 1,350億パラメーターとAscend Neural Processing … 続きを読む →

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

Enhancing Human-Robot Interaction in Healthcare: A Study on Nonverbal Communication Cues and Trust Dynamics with NAO Robot Caregivers

投稿日: 2025年4月14日作成者: jarxiv

要約高齢者の人口が増加すると、人間とロボットのケア提供者の両方が必要になります … 続きを読む →

カテゴリー: cs.HC, cs.RO | コメントを受け付けていません

A Graph-Based Synthetic Data Pipeline for Scaling High-Quality Reasoning Instructions

投稿日: 2025年4月14日作成者: jarxiv

要約継続的なトレーニングのための高品質の推論データの合成は、大規模な言語モデル … 続きを読む →

カテゴリー: cs.CL | コメントを受け付けていません

Breaking the Barriers: Video Vision Transformers for Word-Level Sign Language Recognition

投稿日: 2025年4月14日作成者: jarxiv

要約手話は、ジェスチャー、表情、身体の動きを通して微妙な表現を可能にする、聴覚 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations

投稿日: 2025年4月14日作成者: jarxiv

要約 Visual Grounding（VG）は、自然言語の説明に基づいて画像に … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Scaling Laws for Native Multimodal Models

投稿日: 2025年4月14日作成者: jarxiv

要約マルチモーダル信号を通じて世界を効果的に知覚できる汎用モデルの構築は、長年 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

投稿者「jarxiv」のアーカイブ

Steering CLIP’s vision transformer with sparse autoencoders

GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation

ASHiTA: Automatic Scene-grounded HIerarchical Task Analysis

Do LLMs Understand Your Translations? Evaluating Paragraph-level MT with Question Answering

Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs

Enhancing Human-Robot Interaction in Healthcare: A Study on Nonverbal Communication Cues and Trust Dynamics with NAO Robot Caregivers

A Graph-Based Synthetic Data Pipeline for Scaling High-Quality Reasoning Instructions

Breaking the Barriers: Video Vision Transformers for Word-Level Sign Language Recognition

AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations

Scaling Laws for Native Multimodal Models

最近の投稿

最近のコメント

アーカイブ

カテゴリー