jarxiv | Japanese arxiv | ページ 889

Proof-Carrying Neuro-Symbolic Code

投稿日: 2025年4月17日作成者: jarxiv

要約

この招待された論文は、「神経系コード」の「プルーフを運ぶ神経系コード」の概念を紹介し、「神経」と「象徴的な」視点の両方から、その意味と価値を説明します。
この講演は、この新しい研究が直面している最初の成功と課題の概要を示しています。

要約(オリジナル)

This invited paper introduces the concept of ‘proof-carrying neuro-symbolic code’ and explains its meaning and value, from both the ‘neural’ and the ‘symbolic’ perspectives. The talk outlines the first successes and challenges that this new area of research faces.

arxiv情報

著者	Ekaterina Komendantskaya
発行日	2025-04-16 12:42:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LO, cs.PL, I.2.0 | コメントを受け付けていません

SoK: Decentralized AI (DeAI)

投稿日: 2025年4月17日作成者: jarxiv

要約

集中化により、人工知能（AI）の効率が向上しますが、AIシステムの単一の障害、固有のバイアス、データプライバシーの懸念、スケーラビリティの問題などの重要な課題ももたらします。
これらの問題は、ユーザーデータが収集され、完全な透明性で使用されるクローズドソースの大手言語モデル（LLM）で特に一般的です。
これらの問題に対処するために、ブロックチェーンベースの分散型AI（DEAI）が導入されました。
DEAIは、ブロックチェーンテクノロジーの強みを活用して、透明性、セキュリティ、分散化、およびAIシステムの信頼性を高めます。
DEAIは業界で広く開発されていますが、最先端の実用的なDEAIソリューションの包括的な理解はまだ不足しています。
この作業では、ブロックチェーンベースのDEAIソリューションのための知識（SOK）の体系化を提示します。
モデルライフサイクルに基づいて既存のDEAIプロトコルを分類するための分類法を提案します。
この分類法に基づいて、DEAIプロトコルの風景を明確にし、それらの類似点と相違点を特定するための構造化された方法を提供します。
具体的には、DEAIのブロックチェーンの機能を分析し、AIプロセスのセキュリティ、透明性、および信頼性の向上にブロックチェーン機能がどのように貢献するかを調査し、AIデータとモデルの貢献者の公正なインセンティブを確保します。
さらに、将来の研究のためのDEAIプロトコルの開発における重要な洞察と研究のギャップを提供します。

要約(オリジナル)

Centralization enhances the efficiency of Artificial Intelligence (AI), but it also brings critical challenges, such as single points of failure, inherent biases, data privacy concerns, and scalability issues, for AI systems. These problems are especially common in closed-source large language models (LLMs), where user data is collected and used with full transparency. To address these issues, blockchain-based decentralized AI (DeAI) has been introduced. DeAI leverages the strengths of blockchain technologies to enhance the transparency, security, decentralization, as well as trustworthiness of AI systems. Although DeAI has been widely developed in industry, a comprehensive understanding of state-of-the-art practical DeAI solutions is still lacking. In this work, we present a Systematization of Knowledge (SoK) for blockchain-based DeAI solutions. We propose a taxonomy to classify existing DeAI protocols based on the model lifecycle. Based on this taxonomy, we provide a structured way to clarify the landscape of DeAI protocols and identify their similarities and differences. Specifically, we analyze the functionalities of blockchain in DeAI, investigate how blockchain features contribute to enhancing the security, transparency, and trustworthiness of AI processes, and also ensure fair incentives for AI data and model contributors. In addition, we provide key insights and research gaps in developing DeAI protocols for future research.

arxiv情報

著者	Zhipeng Wang,Rui Sun,Elizabeth Lui,Vatsal Shah,Xihan Xiong,Jiahao Sun,Davide Crapis,William Knottenbelt
発行日	2025-04-16 12:51:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CR, cs.LG | コメントを受け付けていません

RadMamba: Efficient Human Activity Recognition through Radar-based Micro-Doppler-Oriented Mamba State-Space Model

投稿日: 2025年4月17日作成者: jarxiv

要約

レーダーベースのHARは、独自のプライバシーの保存と堅牢性の利点により、ウェアラブルデバイスやカメラベースのシステムなど、従来の監視アプローチの有望な代替手段として浮上しています。
ただし、畳み込みと再発性のニューラルネットワークに基づく既存のソリューションは、効果的ですが、展開中に計算的に要求されます。
これにより、制約されたリソースまたは複数のセンサーが必要なリソースでシナリオでの適用性が制限されます。
VITやSSMアーキテクチャなどの高度なアーキテクチャは、改善されたモデリング機能を提供し、軽量設計に向けて努力しています。
ただし、計算の複雑さは比較的高いままです。
トランスアーキテクチャの強度を活用しながら、同時に精度を向上させ、計算の複雑さを低下させるために、このホワイトペーパーでは、レーダーベースのHAR専用のパラメーターマイクロドップラー指向のMamba SSMであるRadmambaを紹介します。
3つの多様なデータセットで、Radmambaは、パラメーターの1/400のみで、Dataset DIATでのトップパフォーマンスの以前のモデルの99.8％の分類精度と一致し、Dataset CI4Rの主要なモデルの92.0％の精度とパラメーターの1/10に等しくなります。
DataSet UOG2020で評価された連続的な一連のアクションを備えたシナリオでは、Radmambaはパラメーター数が大幅に高い他のモデルを少なくとも3％上回り、6.7Kパラメーターでこれを達成します。
私たちのコードは、https：//github.com/lab-emi/airharで入手できます。

要約(オリジナル)

Radar-based HAR has emerged as a promising alternative to conventional monitoring approaches, such as wearable devices and camera-based systems, due to its unique privacy preservation and robustness advantages. However, existing solutions based on convolutional and recurrent neural networks, although effective, are computationally demanding during deployment. This limits their applicability in scenarios with constrained resources or those requiring multiple sensors. Advanced architectures, such as ViT and SSM architectures, offer improved modeling capabilities and have made efforts toward lightweight designs. However, their computational complexity remains relatively high. To leverage the strengths of transformer architectures while simultaneously enhancing accuracy and reducing computational complexity, this paper introduces RadMamba, a parameter-efficient, radar micro-Doppler-oriented Mamba SSM specifically tailored for radar-based HAR. Across three diverse datasets, RadMamba matches the top-performing previous model’s 99.8% classification accuracy on Dataset DIAT with only 1/400 of its parameters and equals the leading models’ 92.0% accuracy on Dataset CI4R with merely 1/10 of their parameters. In scenarios with continuous sequences of actions evaluated on Dataset UoG2020, RadMamba surpasses other models with significantly higher parameter counts by at least 3%, achieving this with only 6.7k parameters. Our code is available at: https://github.com/lab-emi/AIRHAR.

arxiv情報

著者	Yizhuo Wu,Francesco Fioranelli,Chang Gao
発行日	2025-04-16 12:54:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models

投稿日: 2025年4月17日作成者: jarxiv

要約

リモートセンシングにおける豊富でよく目立たないマルチモーダルデータは、複雑な視覚リモートセンシング（RS）シーンを人間の言語に合わせるために極めて重要であり、多様なRS解釈タスク全体で特殊なビジョン言語モデルの開発を可能にします。
ただし、RS画像を大規模に豊富な言語セマンティクスで注釈するには、RSとかなりの人間の労働の専門知識が必要であり、費用がかかり、しばしば非現実的です。
この調査では、Google Earth Engine（GEE）プラットフォームから供給された画像のプレーンオープンストリートマップ（OSM）データから、セマンティカルリッチキャプションでマルチモーダルデータセットを生成するために、大規模な言語モデル（LLM）を活用するワークフローを提案します。
このアプローチは、ペアのリモートセンシングデータの生成を容易にし、オープンに利用可能なデータを使用して容易に拡大することができます。
このフレームワーク内で、130万を超えるRS画像を含むマルチモーダルデータセットであるRstellerを提示し、それぞれに2つの記述キャプションが伴います。
広範な実験は、RSTELLEが継続的なトレーニングを通じてRSシーンの理解のための複数の既存のビジョン言語モデルのパフォーマンスを向上させることを示しています。
私たちの方法論は、高品質の注釈付きデータへのアクセスを民主化する一方で、リモートセンシング画像に注釈を付けるために必要な手動の取り組みと専門知識を大幅に削減します。
この進歩は、視覚言語モデリングの進歩を促進し、リモートセンシングの研究とアプリケーションへのより広範な参加を促進します。
rstellerデータセットは、https：//github.com/slytheringe/rstellerで入手できます。

要約(オリジナル)

Abundant, well-annotated multimodal data in remote sensing are pivotal for aligning complex visual remote sensing (RS) scenes with human language, enabling the development of specialized vision language models across diverse RS interpretation tasks. However, annotating RS images with rich linguistic semantics at scale demands expertise in RS and substantial human labor, making it costly and often impractical. In this study, we propose a workflow that leverages large language models (LLMs) to generate multimodal datasets with semantically rich captions at scale from plain OpenStreetMap (OSM) data for images sourced from the Google Earth Engine (GEE) platform. This approach facilitates the generation of paired remote sensing data and can be readily scaled up using openly available data. Within this framework, we present RSTeller, a multimodal dataset comprising over 1.3 million RS images, each accompanied by two descriptive captions. Extensive experiments demonstrate that RSTeller enhances the performance of multiple existing vision language models for RS scene understanding through continual pre-training. Our methodology significantly reduces the manual effort and expertise needed for annotating remote sensing imagery while democratizing access to high-quality annotated data. This advancement fosters progress in visual language modeling and encourages broader participation in remote sensing research and applications. The RSTeller dataset is available at https://github.com/SlytherinGe/RSTeller.

arxiv情報

著者	Junyao Ge,Xu Zhang,Yang Zheng,Kaitai Guo,Jimin Liang
発行日	2025-04-16 13:02:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CV, I.2.10 | コメントを受け付けていません

BoTTA: Benchmarking on-device Test Time Adaptation

投稿日: 2025年4月17日作成者: jarxiv

要約

ディープラーニングモデルのパフォーマンスは、実行時にテストサンプルに大きく依存し、トレーニングデータの分布からのシフトは精度を大幅に低下させる可能性があります。
テスト時間適応（TTA）は、ラベル付きのテストデータや元のトレーニングセットへのアクセスを必要とせずに、推論中にモデルを適応させることにより、これに対処します。
調査により、アルゴリズムの複雑さ、データとクラスの分布シフト、モデルアーキテクチャ、オフラインと継続的な学習などのさまざまな観点からTTAが調査されていますが、モバイルおよびエッジデバイスに固有の制約は依存していないままです。
モバイルおよびエッジデバイスの実際の制約の下でTTAメソッドを評価するために設計されたベンチマークであるBottaを提案します。
私たちの評価では、限られたリソースと使用条件によって引き起こされる4つの重要な課題を対象としています。（i）限られたテストサンプル、（ii）カテゴリへの限られた曝露、（iii）多様な分布シフト、および（iv）サンプル内のシフトの重複。
ベンチマークデータセットを使用して、これらのシナリオで最先端のTTAメソッドを評価し、実際のテストベッドでシステムレベルのメトリックをレポートします。
さらに、以前の作業とは異なり、継続的な推論時間適応の代わりに定期的な適応を提唱することにより、デバイス上の要件に沿っています。
実験は重要な洞察を明らかにしています。最近の多くのTTAアルゴリズムは、小さなデータセットと格闘し、目に見えないカテゴリに一般化することができず、分布シフトの多様性と複雑さに依存しています。
Bottaは、デバイス固有のリソースの使用も報告しています。
たとえば、Shotは512ドルの適応サンプルで2.25 \ Times $ $ $ 2.25 \ Times $だけ改善されますが、Raspberry Piとベースモデルで$ 1.08 \ Times $のピークメモリを使用します。
Bottaは、実際のリソースに制約のある展開におけるTTAのための実用的なガイダンスを提供しています。

要約(オリジナル)

The performance of deep learning models depends heavily on test samples at runtime, and shifts from the training data distribution can significantly reduce accuracy. Test-time adaptation (TTA) addresses this by adapting models during inference without requiring labeled test data or access to the original training set. While research has explored TTA from various perspectives like algorithmic complexity, data and class distribution shifts, model architectures, and offline versus continuous learning, constraints specific to mobile and edge devices remain underexplored. We propose BoTTA, a benchmark designed to evaluate TTA methods under practical constraints on mobile and edge devices. Our evaluation targets four key challenges caused by limited resources and usage conditions: (i) limited test samples, (ii) limited exposure to categories, (iii) diverse distribution shifts, and (iv) overlapping shifts within a sample. We assess state-of-the-art TTA methods under these scenarios using benchmark datasets and report system-level metrics on a real testbed. Furthermore, unlike prior work, we align with on-device requirements by advocating periodic adaptation instead of continuous inference-time adaptation. Experiments reveal key insights: many recent TTA algorithms struggle with small datasets, fail to generalize to unseen categories, and depend on the diversity and complexity of distribution shifts. BoTTA also reports device-specific resource use. For example, while SHOT improves accuracy by $2.25\times$ with $512$ adaptation samples, it uses $1.08\times$ peak memory on Raspberry Pi versus the base model. BoTTA offers actionable guidance for TTA in real-world, resource-constrained deployments.

arxiv情報

著者	Michal Danilowski,Soumyajit Chatterjee,Abhirup Ghosh
発行日	2025-04-16 13:16:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

Optimizing Compound Retrieval Systems

投稿日: 2025年4月17日作成者: jarxiv

要約

最新の検索システムは、単一のランキングモデルに依存してランキングを構築しません。
代わりに、彼らは通常、一連のランキングモデルが複数の再ランク段階で適用されるカスケードアプローチを採用します。
これにより、各モデルの再ランクの数を制限することにより、TOP-Kランキングの品質と計算コストのバランスを取ります。
ただし、カスケードアプローチは、モデルが対話して検索システムを形成する唯一の方法ではありません。
複数の予測モデルを適用するより広いクラスの検索システムとして、複合検索システムの概念を提案します。
これにより、カスケードモデルがカプセル化されていますが、Top-Kの再ランクよりも他のタイプの相互作用も許可します。
特に、相対的な関連性の比較を提供できる大規模な言語モデル（LLM）との相互作用を可能にします。
コンポーネントモデルを適用する場所と予測を最終ランキングに集約する方法を独自に学習するためのユニークな化合物検索システム設計の最適化に焦点を当てます。
この作業は、私たちの複合アプローチが、古典的なBM25検索モデルと最先端の（ペアワイズ）LLM関連予測を組み合わせて、特定のランキングメトリックと効率のターゲットを最適化する方法を示しています。
私たちの実験結果は、最適化された化合物検索システムが、自己監視方法で適用された場合でも、カスケードアプローチよりも有効性と効率の間のより良いトレードオフを提供することを示しています。
化合物検索システムの導入により、情報検索フィールドを、予測モデルがランキングを形成する方法についてのより多くのボックスの考え方に刺激したいと考えています。

要約(オリジナル)

Modern retrieval systems do not rely on a single ranking model to construct their rankings. Instead, they generally take a cascading approach where a sequence of ranking models are applied in multiple re-ranking stages. Thereby, they balance the quality of the top-K ranking with computational costs by limiting the number of documents each model re-ranks. However, the cascading approach is not the only way models can interact to form a retrieval system. We propose the concept of compound retrieval systems as a broader class of retrieval systems that apply multiple prediction models. This encapsulates cascading models but also allows other types of interactions than top-K re-ranking. In particular, we enable interactions with large language models (LLMs) which can provide relative relevance comparisons. We focus on the optimization of compound retrieval system design which uniquely involves learning where to apply the component models and how to aggregate their predictions into a final ranking. This work shows how our compound approach can combine the classic BM25 retrieval model with state-of-the-art (pairwise) LLM relevance predictions, while optimizing a given ranking metric and efficiency target. Our experimental results show optimized compound retrieval systems provide better trade-offs between effectiveness and efficiency than cascading approaches, even when applied in a self-supervised manner. With the introduction of compound retrieval systems, we hope to inspire the information retrieval field to more out-of-the-box thinking on how prediction models can interact to form rankings.

arxiv情報

著者	Harrie Oosterhuis,Rolf Jagerman,Zhen Qin,Xuanhui Wang
発行日	2025-04-16 13:18:16+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.IR, cs.LG | コメントを受け付けていません

Formal Verification of Graph Convolutional Networks with Uncertain Node Features and Uncertain Graph Structure

投稿日: 2025年4月17日作成者: jarxiv

要約

グラフのニューラルネットワークは、グラフで構成されたデータを処理する独自の能力により、機械学習の分野でますます人気が高まっています。
また、摂動が本質的に発生する安全性の高い環境にも適用されています。
ただし、これらの摂動では、ニューラルネットワークが敵対的な攻撃を起こしやすいため、安全性が批判的な環境で展開する前に、ニューラルネットワークを正式に検証する必要があります。
ニューラルネットワークの正式な検証に関する研究が存在しますが、ノード機能と複数のメッセージパスステップにわたってグラフ構造に不確実性を持つ、汎用グラフ畳み込みネットワークアーキテクチャの堅牢性を検証する作業はありません。
この作業は、（マトリックス）多項式ゾノトープを使用した到達可能性分析を通じて、基礎となる計算のすべての要素の非凸依存性を明示的に保存することにより、この研究のギャップに対処します。
3つの人気のあるベンチマークデータセットでアプローチを示します。

要約(オリジナル)

Graph neural networks are becoming increasingly popular in the field of machine learning due to their unique ability to process data structured in graphs. They have also been applied in safety-critical environments where perturbations inherently occur. However, these perturbations require us to formally verify neural networks before their deployment in safety-critical environments as neural networks are prone to adversarial attacks. While there exists research on the formal verification of neural networks, there is no work verifying the robustness of generic graph convolutional network architectures with uncertainty in the node features and in the graph structure over multiple message-passing steps. This work addresses this research gap by explicitly preserving the non-convex dependencies of all elements in the underlying computations through reachability analysis with (matrix) polynomial zonotopes. We demonstrate our approach on three popular benchmark datasets.

arxiv情報

著者	Tobias Ladner,Michael Eichelbeck,Matthias Althoff
発行日	2025-04-16 13:23:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

投稿日: 2025年4月17日作成者: jarxiv

要約

テキスト間拡散モデルにより、テキストの指示に従う高品質のビデオの生成を可能にし、多様で個別のコンテンツを簡単に作成できます。
ただし、既存のアプローチは、主に高品質の短いビデオ生成（通常16または24フレーム）に焦点を当てており、長いビデオ統合の場合に素朴に拡張されたときにハードカットで終わります。
これらの制限を克服するために、80、240、600、1200以上の長いビデオ生成のための自己回帰アプローチであるStreamingT2Vを導入します。
重要なコンポーネントは次のとおりです。（i）条件付き注意モジュール（CAM）と呼ばれる短期メモリブロック。これは、注意メカニズムを介して前のチャンクから抽出された機能の現在の生成を条件付けし、一貫したチャンク遷移につながります。
チャンク間の矛盾なしに無限に長いビデオにビデオエンハンサーを自動網目上に適用する。
実験では、StreamingT2Vが高運動量を生成することが示されています。
対照的に、すべての競合する画像からビデオからビデオへのメソッドは、自己回帰的な方法で素朴に適用されると、ビデオの停滞を起こしやすくなります。
したがって、競合他社を一貫性と動きで優先する高品質のシームレスなテキストから長いビデオジェネレーターをStreamingT2Vで提案します。
私たちのコードは、https：//github.com/picsart-ai-research/streamingt2vで入手できます

要約(オリジナル)

Text-to-video diffusion models enable the generation of high-quality videos that follow text instructions, making it easy to create diverse and individual content. However, existing approaches mostly focus on high-quality short video generation (typically 16 or 24 frames), ending up with hard-cuts when naively extended to the case of long video synthesis. To overcome these limitations, we introduce StreamingT2V, an autoregressive approach for long video generation of 80, 240, 600, 1200 or more frames with smooth transitions. The key components are:(i) a short-term memory block called conditional attention module (CAM), which conditions the current generation on the features extracted from the previous chunk via an attentional mechanism, leading to consistent chunk transitions, (ii) a long-term memory block called appearance preservation module, which extracts high-level scene and object features from the first video chunk to prevent the model from forgetting the initial scene, and (iii) a randomized blending approach that enables to apply a video enhancer autoregressively for infinitely long videos without inconsistencies between chunks. Experiments show that StreamingT2V generates high motion amount. In contrast, all competing image-to-video methods are prone to video stagnation when applied naively in an autoregressive manner. Thus, we propose with StreamingT2V a high-quality seamless text-to-long video generator that outperforms competitors with consistency and motion. Our code will be available at: https://github.com/Picsart-AI-Research/StreamingT2V

arxiv情報

著者	Roberto Henschel,Levon Khachatryan,Hayk Poghosyan,Daniil Hayrapetyan,Vahram Tadevosyan,Zhangyang Wang,Shant Navasardyan,Humphrey Shi
発行日	2025-04-16 13:38:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG, cs.MM, eess.IV | コメントを受け付けていません

Selective Demonstration Retrieval for Improved Implicit Hate Speech Detection

投稿日: 2025年4月17日作成者: jarxiv

要約

ヘイトスピーチの検出は、自然言語処理における研究の重要な分野であり、オンラインコミュニティの安全性を確保するために不可欠です。
ただし、有害な意図が微妙または間接的な方法で伝えられる暗黙のヘイトスピーチを検出することは、依然として大きな課題です。
明示的なヘイトスピーチとは異なり、暗黙の表現はしばしば文脈、文化的微妙さ、隠されたバイアスに依存しているため、一貫して識別するためにより困難になります。
さらに、このようなスピーチの解釈は、外部の知識と人口統計学的バイアスの影響を受け、異なる言語モデルでさまざまな検出結果をもたらします。
さらに、大規模な言語モデルは、しばしば毒性言語に対する感度の高まりと脆弱なグループへの参照を示し、誤分類につながる可能性があります。
この過敏症は、誤検知（無害な声明を憎しみとして誤って識別する）および偽陰性（真の有害なコンテンツを検出できない）をもたらします。
これらの問題に対処するには、検出精度を改善するだけでなく、モデルバイアスを減らし、堅牢性を高める方法が必要です。
これらの課題に対処するために、モデルの微調整を必要とせずにコンテキスト内学習を利用する新しい方法を提案します。
同様のグループまたは類似性スコアが最も高いグループに焦点を当てたデモを適応的に取得することにより、私たちのアプローチは文脈的理解を高めます。
実験結果は、私たちの方法が現在の最先端の手法よりも優れていることを示しています。
実装の詳細とコードはTBDで入手できます。

要約(オリジナル)

Hate speech detection is a crucial area of research in natural language processing, essential for ensuring online community safety. However, detecting implicit hate speech, where harmful intent is conveyed in subtle or indirect ways, remains a major challenge. Unlike explicit hate speech, implicit expressions often depend on context, cultural subtleties, and hidden biases, making them more challenging to identify consistently. Additionally, the interpretation of such speech is influenced by external knowledge and demographic biases, resulting in varied detection results across different language models. Furthermore, Large Language Models often show heightened sensitivity to toxic language and references to vulnerable groups, which can lead to misclassifications. This over-sensitivity results in false positives (incorrectly identifying harmless statements as hateful) and false negatives (failing to detect genuinely harmful content). Addressing these issues requires methods that not only improve detection precision but also reduce model biases and enhance robustness. To address these challenges, we propose a novel method, which utilizes in-context learning without requiring model fine-tuning. By adaptively retrieving demonstrations that focus on similar groups or those with the highest similarity scores, our approach enhances contextual comprehension. Experimental results show that our method outperforms current state-of-the-art techniques. Implementation details and code are available at TBD.

arxiv情報

著者	Yumin Kim,Hwanhee Lee
発行日	2025-04-16 13:43:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

AttentionDrop: A Novel Regularization Method for Transformer Models

投稿日: 2025年4月17日作成者: jarxiv

要約

変圧器ベースのアーキテクチャは、自然言語処理、コンピュータービジョン、および音声における幅広いタスクにわたって最先端のパフォーマンスを実現します。
ただし、特にトレーニングデータが制限されているか騒がしい場合、その膨大な能力はしばしば過剰適合につながります。
私たちは、自己関節分布で直接動作する確率的正則化技術の統一されたファミリーである注意を提案します。
3つのバリアントを紹介します。1。ハード注意マスキング：ランダムにゼロクエリごとにトップKの注意ロジットをゼロにして、多様なコンテキストの利用を促進します。
2。ぼやけた注意の平滑化：注意ロジットよりも動的なガウスの畳み込みを適用して、過度にピークになった分布を拡散させます。
3。一貫性の正規化された注意ドロップ：KLベースの一貫性の損失を介して、複数の独立した注意ドロップ摂動の下で出力の安定性を実施します。

要約(オリジナル)

Transformer-based architectures achieve state-of-the-art performance across a wide range of tasks in natural language processing, computer vision, and speech. However, their immense capacity often leads to overfitting, especially when training data is limited or noisy. We propose AttentionDrop, a unified family of stochastic regularization techniques that operate directly on the self-attention distributions. We introduces three variants: 1. Hard Attention Masking: randomly zeroes out top-k attention logits per query to encourage diverse context utilization. 2. Blurred Attention Smoothing: applies a dynamic Gaussian convolution over attention logits to diffuse overly peaked distributions. 3. Consistency-Regularized AttentionDrop: enforces output stability under multiple independent AttentionDrop perturbations via a KL-based consistency loss.

arxiv情報

著者	Mirza Samad Ahmed Baig,Syeda Anshrah Gillani,Abdul Akbar Khan,Shahid Munir Shah
発行日	2025-04-16 13:51:16+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント