投稿者「jarxiv」のアーカイブ

How Animals Dance (When You’re Not Looking)

投稿日: 2025年5月30日作成者: jarxiv

要約音楽を同期し、振り付け認識アニマルダンスビデオを生成するためのキーフレーム … 続きを読む →

カテゴリー: cs.CV, cs.GR | コメントを受け付けていません

Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis

投稿日: 2025年5月30日作成者: jarxiv

要約基礎モデルと大規模な言語モデル（LLMS）の急速な進歩は、ミトリモーダル入 … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

LayerPeeler: Autoregressive Peeling for Layer-wise Image Vectorization

投稿日: 2025年5月30日作成者: jarxiv

要約画像ベクトル化は、ラスター画像をベクターグラフィックスに変換する強力な手法 … 続きを読む →

カテゴリー: cs.CV, cs.GR | コメントを受け付けていません

MAGREF: Masked Guidance for Any-Reference Video Generation

投稿日: 2025年5月30日作成者: jarxiv

要約ビデオ生成は、深い生成モデル、特に拡散ベースのアプローチの出現に大きな進歩 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

DarkDiff: Advancing Low-Light Raw Enhancement by Retasking Diffusion Models for Camera ISP

投稿日: 2025年5月30日作成者: jarxiv

要約極端な低光環境での高品質の写真は挑戦的ですが、デジタルカメラには影響力があ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Boosting Domain Incremental Learning: Selecting the Optimal Parameters is All You Need

投稿日: 2025年5月30日作成者: jarxiv

要約深いニューラルネットワーク（DNN）は、データ分布が時間とともに変化する現 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

To Trust Or Not To Trust Your Vision-Language Model’s Prediction

投稿日: 2025年5月30日作成者: jarxiv

要約ビジョン言語モデル（VLM）は、視覚的およびテキストのモダリティを調整する … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence

投稿日: 2025年5月30日作成者: jarxiv

要約マルチモーダル大手言語モデル（MLLM）の最近の進歩により、2Dビジュアル … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, I.2 | コメントを受け付けていません

REOrdering Patches Improves Vision Models

投稿日: 2025年5月30日作成者: jarxiv

要約トランスなどのシーケンスモデルでは、入力を1次元シーケンスとして表す必要が … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing Tasks

投稿日: 2025年5月30日作成者: jarxiv

要約大規模な言語モデル（LLMS）の最近の進歩により、段階的な推論を通じて複雑 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

投稿者「jarxiv」のアーカイブ

How Animals Dance (When You’re Not Looking)

Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis

LayerPeeler: Autoregressive Peeling for Layer-wise Image Vectorization

MAGREF: Masked Guidance for Any-Reference Video Generation

DarkDiff: Advancing Low-Light Raw Enhancement by Retasking Diffusion Models for Camera ISP

Boosting Domain Incremental Learning: Selecting the Optimal Parameters is All You Need

To Trust Or Not To Trust Your Vision-Language Model’s Prediction

Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence

REOrdering Patches Improves Vision Models

ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing Tasks

最近の投稿

最近のコメント

アーカイブ

カテゴリー