月別アーカイブ: 2024年6月

ReLUs Are Sufficient for Learning Implicit Neural Representations

投稿日: 2024年6月5日作成者: jarxiv

要約 Rectified Linear Unit (ReLU) を活性化関数とし … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, eess.IV | コメントを受け付けていません

SatSplatYOLO: 3D Gaussian Splatting-based Virtual Object Detection Ensembles for Satellite Feature Recognition

投稿日: 2024年6月5日作成者: jarxiv

要約軌道上整備（OOS）、宇宙船の検査、アクティブデブリ除去（ADR）。この … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Enhancing predictive imaging biomarker discovery through treatment effect analysis

投稿日: 2024年6月5日作成者: jarxiv

要約個々の治療効果を予測する予測バイオマーカーを特定することは、個別化医療にと … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, eess.IV | コメントを受け付けていません

Dsfer-Net: A Deep Supervision and Feature Retrieval Network for Bitemporal Change Detection Using Modern Hopfield Networks

投稿日: 2024年6月5日作成者: jarxiv

要約高解像度のリモートセンシング画像に不可欠なアプリケーションである変化検出 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Enhancing 2D Representation Learning with a 3D Prior

投稿日: 2024年6月5日作成者: jarxiv

要約視覚データの堅牢かつ効果的な表現を学習することは、コンピュータービジョン … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

TopViewRS: Vision-Language Models as Top-View Spatial Reasoners

投稿日: 2024年6月5日作成者: jarxiv

要約トップビューの視点は、人間がさまざまなタイプの地図を読み、推論する典型的な … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG | コメントを受け付けていません

Parrot: Multilingual Visual Instruction Tuning

投稿日: 2024年6月5日作成者: jarxiv

要約 GPT-4V のようなマルチモーダル大規模言語モデル (MLLM) の急速 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

投稿日: 2024年6月5日作成者: jarxiv

要約拡散トランス (DiT) は、テキストの指示に基づいてリアルな画像やビデオ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting

投稿日: 2024年6月5日作成者: jarxiv

要約ゼロショットビデオ拡散モデルの最近の進歩により、テキスト駆動のビデオ編集 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning

投稿日: 2024年6月5日作成者: jarxiv

要約コンテキスト内長が長いモデルをトレーニングすることは、GPU メモリと計算 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

月別アーカイブ: 2024年6月

ReLUs Are Sufficient for Learning Implicit Neural Representations

SatSplatYOLO: 3D Gaussian Splatting-based Virtual Object Detection Ensembles for Satellite Feature Recognition

Enhancing predictive imaging biomarker discovery through treatment effect analysis

Dsfer-Net: A Deep Supervision and Feature Retrieval Network for Bitemporal Change Detection Using Modern Hopfield Networks

Enhancing 2D Representation Learning with a 3D Prior

TopViewRS: Vision-Language Models as Top-View Spatial Reasoners

Parrot: Multilingual Visual Instruction Tuning

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting

Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning

最近の投稿

最近のコメント

アーカイブ

カテゴリー