「cs.LG」カテゴリーアーカイブ

ConvMixFormer- A Resource-efficient Convolution Mixer for Transformer-based Dynamic Hand Gesture Recognition

投稿日: 2024年12月3日作成者: jarxiv

要約 Transformer モデルは、自然言語処理 (NLP) やコンピュータ … 続きを読む →

カテゴリー: cs.CV, cs.HC, cs.LG | コメントを受け付けていません

OminiControl: Minimal and Universal Control for Diffusion Transformer

投稿日: 2024年12月3日作成者: jarxiv

要約このペーパーでは、画像条件を事前トレーニング済みの拡散変換 (DiT) モ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Understanding Generalizability of Diffusion Models Requires Rethinking the Hidden Gaussian Structure

投稿日: 2024年12月3日作成者: jarxiv

要約この研究では、学習されたスコア関数の隠れた特性を調べることによって拡散モデ … 続きを読む →

カテゴリー: cs.CV, cs.LG, eess.IV, eess.SP | コメントを受け付けていません

What Differentiates Educational Literature? A Multimodal Fusion Approach of Transformers and Computational Linguistics

投稿日: 2024年12月3日作成者: jarxiv

要約教育者には読みやすさを迅速に評価し、教室の多様なニーズに合わせてテキストを … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM’s Reasoning Capability

投稿日: 2024年12月3日作成者: jarxiv

要約大規模言語モデル (LLM) は、推論タスクで顕著なパフォーマンスを示しま … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs

投稿日: 2024年12月3日作成者: jarxiv

要約画像領域におけるマルチモーダル大規模言語モデル (MLLM) の成功は、研 … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG | コメントを受け付けていません

OWLed: Outlier-weighed Layerwise Pruning for Efficient Autonomous Driving Framework

投稿日: 2024年12月2日作成者: jarxiv

要約大規模言語モデル (LLM) を自動運転システムに統合すると、環境の理解と … 続きを読む →

カテゴリー: cs.LG, cs.RO | コメントを受け付けていません

ELEMENTAL: Interactive Learning from Demonstrations and Vision-Language Models for Reward Design in Robotics

投稿日: 2024年12月2日作成者: jarxiv

要約強化学習 (RL) はロボットタスクにおいて魅力的なパフォーマンスを示し … 続きを読む →

カテゴリー: cs.LG, cs.RO | コメントを受け付けていません

Control-oriented Clustering of Visual Latent Representation

投稿日: 2024年12月2日作成者: jarxiv

要約私たちは、動作のクローン作成から学習した画像ベースの制御パイプラインにおけ … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.RO | コメントを受け付けていません

GRAPE: Generalizing Robot Policy via Preference Alignment

投稿日: 2024年12月2日作成者: jarxiv

要約さまざまなロボット工学タスクに関するビジョン・言語・アクション (VLA) … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.RO | コメントを受け付けていません

「cs.LG」カテゴリーアーカイブ

ConvMixFormer- A Resource-efficient Convolution Mixer for Transformer-based Dynamic Hand Gesture Recognition

OminiControl: Minimal and Universal Control for Diffusion Transformer

Understanding Generalizability of Diffusion Models Requires Rethinking the Hidden Gaussian Structure

What Differentiates Educational Literature? A Multimodal Fusion Approach of Transformers and Computational Linguistics

Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM’s Reasoning Capability

T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs

OWLed: Outlier-weighed Layerwise Pruning for Efficient Autonomous Driving Framework

ELEMENTAL: Interactive Learning from Demonstrations and Vision-Language Models for Reward Design in Robotics

Control-oriented Clustering of Visual Latent Representation

GRAPE: Generalizing Robot Policy via Preference Alignment

最近の投稿

最近のコメント

アーカイブ

カテゴリー