jarxiv | Japanese arxiv | ページ 1859

ALPET: Active Few-shot Learning for Citation Worthiness Detection in Low-Resource Wikipedia Languages

投稿日: 2025年2月6日作成者: jarxiv

要約

引用価値検出（CWD）は、記事またはコレクション内のどの文を、提供する情報を検証するために引用をバックアップする必要があるかを決定することで構成されています。
この研究では、アクティブ学習（AL）とパターン抽出トレーニング（PET）を組み合わせたフレームワークであるALPETを紹介し、データリソースが限られている言語のCWDを強化します。
カタラン、バスク、アルバニアのウィキペディアデータセットに適用されるAlpetは、既存のCCWベースラインを上回り、場合によっては80 \％を超えてラベル付けされたデータの量を減らします。
300のラベル付きサンプル後のAlpetのパフォーマンスプラトーは、大きなラベル付きデータセットが一般的ではない低リソースシナリオに適していることを示しています。
K-Meansクラスタリングを採用しているものと同様に、特定のアクティブ学習クエリ戦略は利点を提供できますが、その有効性は普遍的ではなく、特にデータセットが小さい場合、ランダムサンプリングよりもわずかな利益を得ることがよくあります。
これは、ランダムサンプリングは、その単純さにもかかわらず、制約リソース環境におけるCWDにとって強力なベースラインのままであることを示唆しています。
全体として、ラベルの付いたサンプルを少なくして高性能を達成するAlpetの能力は、低リソースの言語設定でオンラインコンテンツの検証可能性を高めるための有望なツールになります。

要約(オリジナル)

Citation Worthiness Detection (CWD) consists in determining which sentences, within an article or collection, should be backed up with a citation to validate the information it provides. This study, introduces ALPET, a framework combining Active Learning (AL) and Pattern-Exploiting Training (PET), to enhance CWD for languages with limited data resources. Applied to Catalan, Basque, and Albanian Wikipedia datasets, ALPET outperforms the existing CCW baseline while reducing the amount of labeled data in some cases above 80\%. ALPET’s performance plateaus after 300 labeled samples, showing it suitability for low-resource scenarios where large, labeled datasets are not common. While specific active learning query strategies, like those employing K-Means clustering, can offer advantages, their effectiveness is not universal and often yields marginal gains over random sampling, particularly with smaller datasets. This suggests that random sampling, despite its simplicity, remains a strong baseline for CWD in constraint resource environments. Overall, ALPET’s ability to achieve high performance with fewer labeled samples makes it a promising tool for enhancing the verifiability of online content in low-resource language settings.

arxiv情報

著者	Aida Halitaj,Arkaitz Zubiaga
発行日	2025-02-05 15:49:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

MeDiSumQA: Patient-Oriented Question-Answer Generation from Discharge Letters

投稿日: 2025年2月6日作成者: jarxiv

要約

患者の医療文書へのアクセスを増やすことで医療が改善されますが、この利点は、さまざまな健康リテラシーレベルと複雑な医療用語によって制限されます。
大規模な言語モデル（LLMS）は、医療情報を簡素化することでソリューションを提供します。
ただし、標準化された評価リソースが不足しているため、安全で患者に優しいテキスト生成についてLLMを評価することは困難です。
このギャップを埋めるために、Medisumqaを開発しました。
Medisumqaは、LLMベースの質問回答と手動の品質チェックを組み合わせた自動化されたパイプラインを通じて、Mimic-IV放電概要から作成されたデータセットです。
このデータセットを使用して、患者指向の質問回答に関するさまざまなLLMを評価します。
私たちの調査結果は、汎用LLMが生物医学に適応したモデルを頻繁に上回る一方で、自動化されたメトリックが人間の判断と相関することを明らかにしています。
PhysionetでMedisumqaをリリースすることにより、患者の理解を高め、最終的にケアの結果を改善するために、LLMSの開発を進めることを目指しています。

要約(オリジナル)

While increasing patients’ access to medical documents improves medical care, this benefit is limited by varying health literacy levels and complex medical terminology. Large language models (LLMs) offer solutions by simplifying medical information. However, evaluating LLMs for safe and patient-friendly text generation is difficult due to the lack of standardized evaluation resources. To fill this gap, we developed MeDiSumQA. MeDiSumQA is a dataset created from MIMIC-IV discharge summaries through an automated pipeline combining LLM-based question-answer generation with manual quality checks. We use this dataset to evaluate various LLMs on patient-oriented question-answering. Our findings reveal that general-purpose LLMs frequently surpass biomedical-adapted models, while automated metrics correlate with human judgment. By releasing MeDiSumQA on PhysioNet, we aim to advance the development of LLMs to enhance patient understanding and ultimately improve care outcomes.

arxiv情報

著者	Amin Dada,Osman Alperen Koras,Marie Bauer,Amanda Butler,Kaleb E. Smith,Jens Kleesiek,Julian Friedrich
発行日	2025-02-05 15:56:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning

投稿日: 2025年2月6日作成者: jarxiv

要約

大規模な言語モデル（LLM）はさまざまなタスクにわたって優れていますが、標準的な一次（FO）微調整はかなりのメモリを要求し、実際の展開を大幅に制限します。
最近、Zeroth-Order（ZO）の最適化は、有望なメモリ効率の高いトレーニングパラダイムとして際立っており、後方パスを避け、勾配推定のために前方パスのみに依存し、リソースに制約のシナリオにとって魅力的になりました。
ただし、ZOメソッドは、収束速度と精度の両方でFOメソッドに遅れをとっています。
ギャップを埋めるために、FOおよびZOの最適化の明確な更新パターンを明らかにする新しいレイヤーごとの発散分析を導入します。
調査結果からのFOメソッドの学習能力に似ていることを目指して、\ textbf {di} vergence-driven \ textbf {z} eroth- \ textbf {o} rder（\ textbf {dizo}）最適化を提案します。
Dizoは、ZOアップデートへの投影を組み込み、レイヤーごとの個々の最適化ニーズに正確にスケーリングされた多様なマグニチュードアップデートを生成することにより、分岐駆動型のレイヤー適応を実施します。
私たちの結果は、DIZOがスループットを犠牲にすることなく収束に必要な反復を大幅に減らし、さまざまなデータセットでGPU時間を最大48 \％削減することを大幅に減らすことを示しています。
さらに、Dizoは、下流のタスクで微調整されたRoberta-Large、Optシリーズ、およびLlamaシリーズの代表的なZOベースラインを一貫して上回り、場合によっては、メモリ集約型の微調整を上回ります。

要約(オリジナル)

Large language models (LLMs) excel across various tasks, but standard first-order (FO) fine-tuning demands considerable memory, significantly limiting real-world deployment. Recently, zeroth-order (ZO) optimization stood out as a promising memory-efficient training paradigm, avoiding backward passes and relying solely on forward passes for gradient estimation, making it attractive for resource-constrained scenarios. However, ZO method lags far behind FO method in both convergence speed and accuracy. To bridge the gap, we introduce a novel layer-wise divergence analysis that uncovers the distinct update pattern of FO and ZO optimization. Aiming to resemble the learning capacity of FO method from the findings, we propose \textbf{Di}vergence-driven \textbf{Z}eroth-\textbf{O}rder (\textbf{DiZO}) optimization. DiZO conducts divergence-driven layer adaptation by incorporating projections to ZO updates, generating diverse-magnitude updates precisely scaled to layer-wise individual optimization needs. Our results demonstrate that DiZO significantly reduces the needed iterations for convergence without sacrificing throughput, cutting training GPU hours by up to 48\% on various datasets. Moreover, DiZO consistently outperforms the representative ZO baselines in fine-tuning RoBERTa-large, OPT-series, and Llama-series on downstream tasks and, in some cases, even surpasses memory-intensive FO fine-tuning.

arxiv情報

著者	Qitao Tan,Jun Liu,Zheng Zhan,Caiwei Ding,Yanzhi Wang,Jin Lu,Geng Yuan
発行日	2025-02-05 16:03:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

Causal Composition Diffusion Model for Closed-loop Traffic Generation

投稿日: 2025年2月6日作成者: jarxiv

要約

シミュレーションは、特に複雑なインタラクティブな動作をキャプチャする際に、自律運転における安全性評価に重要です。
ただし、長期尾の状況で現実的で制御可能なトラフィックシナリオを生成することは依然として重要な課題です。
既存の生成モデルは、ユーザー定義の制御可能性とリアリズムの制約との間に矛盾する目的に悩まされており、これは安全性が批判的なコンテキストで増幅されます。
この作業では、これらの課題に対処するための構造誘導拡散フレームワークである因果組成拡散モデル（CCDIFF）を紹介します。
最初に、制約可能な最適化問題として、制御可能で現実的な閉ループシミュレーションの学習を定式化します。
次に、CCDIFFは制御可能性を最大化しながら、因果構造を拡散プロセスに直接識別および注入することにより、リアリズムを順守し、リアリズムと制御可能性の両方を強化するための構造化されたガイダンスを提供します。
ベンチマークデータセットと閉ループシミュレーターでの厳密な評価を通じて、CCDIFFは、現実的でユーザープロファーの軌跡を生成する際の最先端のアプローチに対する大幅な利益を示しています。
我々の結果は、因果構造の抽出と活用におけるCCDIFFの有効性を示しており、衝突率、オフロードレート、FDE、快適さなどの主要なメトリックに基づいて閉ループのパフォーマンスの向上を示しています。

要約(オリジナル)

Simulation is critical for safety evaluation in autonomous driving, particularly in capturing complex interactive behaviors. However, generating realistic and controllable traffic scenarios in long-tail situations remains a significant challenge. Existing generative models suffer from the conflicting objective between user-defined controllability and realism constraints, which is amplified in safety-critical contexts. In this work, we introduce the Causal Compositional Diffusion Model (CCDiff), a structure-guided diffusion framework to address these challenges. We first formulate the learning of controllable and realistic closed-loop simulation as a constrained optimization problem. Then, CCDiff maximizes controllability while adhering to realism by automatically identifying and injecting causal structures directly into the diffusion process, providing structured guidance to enhance both realism and controllability. Through rigorous evaluations on benchmark datasets and in a closed-loop simulator, CCDiff demonstrates substantial gains over state-of-the-art approaches in generating realistic and user-preferred trajectories. Our results show CCDiff’s effectiveness in extracting and leveraging causal structures, showing improved closed-loop performance based on key metrics such as collision rate, off-road rate, FDE, and comfort.

arxiv情報

著者	Haohong Lin,Xin Huang,Tung Phan-Minh,David S. Hayden,Huan Zhang,Ding Zhao,Siddhartha Srinivasa,Eric M. Wolff,Hongge Chen
発行日	2025-02-05 16:08:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG, cs.RO | コメントを受け付けていません

A Systematic Literature Review on Explainability for Machine/Deep Learning-based Software Engineering Research

投稿日: 2025年2月6日作成者: jarxiv

要約

特に機械学習（ML）およびディープラーニング（DL）における人工知能（AI）アルゴリズムの顕著な成果は、ソフトウェアエンジニアリング（SE）を含む複数のセクターにおける広範な展開を促進しました。
ただし、ブラックボックスの性質により、これらの有望なAI駆動型SEモデルは、実際に展開されていないことには程遠いものです。
この説明可能性の欠如は、意思決定の透明性が最も重要である脆弱性検出など、重要なタスクでのアプリケーションに望ましくないリスクをもたらします。
この論文は、SEのコンテキスト内でAIモデルの説明可能性を改善することを目的とするアプローチの体系的な文献レビューを提示することにより、この学際的なドメインを解明するよう努めています。
レビューキャンバスは、最も顕著なSE＆AI会議とジャーナルに登場する作業を行い、23のユニークなSEタスクにわたって108の論文にまたがっています。
3つの主要な研究質問（RQ）に基づいて、（1）Xaiテクニックがこれまでに成功したSEタスクを要約することを目指しています。
（2）さまざまなXAI技術を分類および分析します。
（3）既存の評価アプローチを調査する。
調査結果に基づいて、既存の研究で対処されるために残っている一連の課題を特定し、将来の仕事に適切かつ重要であると考える潜在的な機会を強調する一連のガイドラインを特定しました。

要約(オリジナル)

The remarkable achievements of Artificial Intelligence (AI) algorithms, particularly in Machine Learning (ML) and Deep Learning (DL), have fueled their extensive deployment across multiple sectors, including Software Engineering (SE). However, due to their black-box nature, these promising AI-driven SE models are still far from being deployed in practice. This lack of explainability poses unwanted risks for their applications in critical tasks, such as vulnerability detection, where decision-making transparency is of paramount importance. This paper endeavors to elucidate this interdisciplinary domain by presenting a systematic literature review of approaches that aim to improve the explainability of AI models within the context of SE. The review canvasses work appearing in the most prominent SE & AI conferences and journals, and spans 108 papers across 23 unique SE tasks. Based on three key Research Questions (RQs), we aim to (1) summarize the SE tasks where XAI techniques have shown success to date; (2) classify and analyze different XAI techniques; and (3) investigate existing evaluation approaches. Based on our findings, we identified a set of challenges remaining to be addressed in existing studies, together with a set of guidelines highlighting potential opportunities we deemed appropriate and important for future work.

arxiv情報

著者	Sicong Cao,Xiaobing Sun,Ratnadira Widyasari,David Lo,Xiaoxue Wu,Lili Bo,Jiale Zhang,Bin Li,Wei Liu,Di Wu,Yixin Chen
発行日	2025-02-05 16:10:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.SE | コメントを受け付けていません

How Developers Interact with AI: A Taxonomy of Human-AI Collaboration in Software Engineering

投稿日: 2025年2月6日作成者: jarxiv

要約

大規模な言語モデルや生成AIを含む人工知能（AI）は、ソフトウェア開発の重要な力として浮上しており、開発ライフサイクル全体に及ぶ強力なツールを開発者に提供しています。
ソフトウェアエンジニアリングの研究はソフトウェア開発においてAIツールを広く研究してきましたが、開発者とこれらのAI駆動のツールとの間の特定のタイプの相互作用は、最近注目を集め始めました。
これらの相互作用を理解し、改善することは、AI駆動型ワークフローの生産性、信頼、効率を向上させる可能性があります。
この論文では、開発者とAIツール間の相互作用タイプの分類法を提案し、自動コンプリートコード提案、コマンド駆動型アクション、会話支援など、11の異なるインタラクションタイプを特定します。
この分類法に基づいて、AIの相互作用の最適化、開発者の制御の改善、AI支援開発における信頼と使いやすさの課題への対処に焦点を当てた研究アジェンダの概要を説明します。
開発者とAIの相互作用を研究するための構造化された基盤を確立することにより、このペーパーは、ソフトウェア開発のためのより効果的で適応性のあるAIツールの作成に関する研究を刺激することを目的としています。

要約(オリジナル)

Artificial intelligence (AI), including large language models and generative AI, is emerging as a significant force in software development, offering developers powerful tools that span the entire development lifecycle. Although software engineering research has extensively studied AI tools in software development, the specific types of interactions between developers and these AI-powered tools have only recently begun to receive attention. Understanding and improving these interactions has the potential to enhance productivity, trust, and efficiency in AI-driven workflows. In this paper, we propose a taxonomy of interaction types between developers and AI tools, identifying eleven distinct interaction types, such as auto-complete code suggestions, command-driven actions, and conversational assistance. Building on this taxonomy, we outline a research agenda focused on optimizing AI interactions, improving developer control, and addressing trust and usability challenges in AI-assisted development. By establishing a structured foundation for studying developer-AI interactions, this paper aims to stimulate research on creating more effective, adaptive AI tools for software development.

arxiv情報

著者	Christoph Treude,Marco A. Gerosa
発行日	2025-02-05 16:11:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.HC, cs.SE | コメントを受け付けていません

Conversation Routines: A Prompt Engineering Framework for Task-Oriented Dialog Systems

投稿日: 2025年2月6日作成者: jarxiv

要約

この研究では、大規模な言語モデル（LLM）を使用してタスク指向のダイアログシステムを開発するための構造化されたプロンプトエンジニアリングフレームワークである会話ルーチン（CR）を紹介します。
LLMは顕著な自然言語理解能力を示していますが、複雑なビジネスワークフローを確実に実行するためにそれらを設計することは依然として困難です。
提案されたCRフレームワークにより、自然言語仕様を通じて会話エージェントシステム（CAS）の開発が可能になり、LLMプロンプトにタスク指向のロジックを埋め込みます。
このアプローチは、行動の一貫性を維持しながら、複雑な会話ワークフローを設計および実装するための体系的な方法論を提供します。
2つの概念実装の実装を通じてフレームワークの有効性を実証します：列車のチケット予約システムとインタラクティブなトラブルシューティングカピロー。
これらのケーススタディは、自然な会話の柔軟性を維持しながら、洗練された行動パターンと決定論理をエンコードするCRの能力を検証します。
結果は、CRがソフトウェアエンジニアによって開発されたカスタム関数（ツール）を活用しながら、ドメインの専門家が自然言語で会話のワークフローを設計できることを示しており、開発者がコアAPI実装に焦点を当て、ドメインの専門家が会話のデザインを処理する効率的な責任の分割を作成します。
フレームワークはアクセシビリティと適応性の有望を示していますが、計算オーバーヘッド、非決定論的行動、ドメイン固有のロジック最適化などの重要な課題を特定します。
将来の研究の方向性には、目標指向のグレーディング基準によって駆動されるプロンプトエンジニアリングフレームワークに基づくCR評価方法、複雑なマルチエージェント相互作用のスケーラビリティの向上、および多様なビジネスアプリケーション全体の特定された制限に対処するためのシステムの堅牢性を高めることが含まれます。

要約(オリジナル)

This study introduces Conversation Routines (CR), a structured prompt engineering framework for developing task-oriented dialog systems using Large Language Models (LLMs). While LLMs demonstrate remarkable natural language understanding capabilities, engineering them to reliably execute complex business workflows remains challenging. The proposed CR framework enables the development of Conversation Agentic Systems (CAS) through natural language specifications, embedding task-oriented logic within LLM prompts. This approach provides a systematic methodology for designing and implementing complex conversational workflows while maintaining behavioral consistency. We demonstrate the framework’s effectiveness through two proof-of-concept implementations: a Train Ticket Booking System and an Interactive Troubleshooting Copilot. These case studies validate CR’s capability to encode sophisticated behavioral patterns and decision logic while preserving natural conversational flexibility. Results show that CR enables domain experts to design conversational workflows in natural language while leveraging custom functions (tools) developed by software engineers, creating an efficient division of responsibilities where developers focus on core API implementation and domain experts handle conversation design. While the framework shows promise in accessibility and adaptability, we identify key challenges including computational overhead, non-deterministic behavior, and domain-specific logic optimization. Future research directions include CR evaluation methods based on prompt engineering frameworks driven by goal-oriented grading criteria, improving scalability for complex multi-agent interactions, and enhancing system robustness to address the identified limitations across diverse business applications.

arxiv情報

著者	Giorgio Robino
発行日	2025-02-05 16:21:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.ET, cs.HC, cs.PL | コメントを受け付けていません

Simplifying Formal Proof-Generating Models with ChatGPT and Basic Searching Techniques

投稿日: 2025年2月6日作成者: jarxiv

要約

正式な証明生成の課題には豊かな歴史がありますが、現代のテクニックにより、私たちはついに現実の数学的問題を実際に進歩させる段階にあるかもしれません。
このホワイトペーパーでは、Minif2Fデータセットに特に焦点を当てて、正式な証明の生成を簡素化するためのChATGPTと基本的な検索手法の統合について説明します。
ChatGptのような大規模な言語モデルを、検証可能であるという追加の利点があるLeanなどの正式な言語を組み合わせることが、正式な証明生成の効率とアクセシビリティをどのように組み合わせるかを示します。
そのシンプルさにもかかわらず、当社の最もパフォーマンスのあるリーンベースのモデルは、31.15％の合格率ですべての既知のベンチマークを上回ります。
実験を拡張して、他のデータセットを含め、代替言語モデルを採用し、多様な設定でモデルの同等のパフォーマンスを紹介し、結果のより微妙な分析を可能にします。
私たちの調査結果は、AIアシストされた正式な証明生成に関する洞察を提供し、正式な数学的証拠における将来の研究の有望な方向性を示唆しています。

要約(オリジナル)

The challenge of formal proof generation has a rich history, but with modern techniques, we may finally be at the stage of making actual progress in real-life mathematical problems. This paper explores the integration of ChatGPT and basic searching techniques to simplify generating formal proofs, with a particular focus on the miniF2F dataset. We demonstrate how combining a large language model like ChatGPT with a formal language such as Lean, which has the added advantage of being verifiable, enhances the efficiency and accessibility of formal proof generation. Despite its simplicity, our best-performing Lean-based model surpasses all known benchmarks with a 31.15% pass rate. We extend our experiments to include other datasets and employ alternative language models, showcasing our models’ comparable performance in diverse settings and allowing for a more nuanced analysis of our results. Our findings offer insights into AI-assisted formal proof generation, suggesting a promising direction for future research in formal mathematical proof.

arxiv情報

著者	Sangjun Han,Taeil Hur,Youngmi Hur,Kathy Sangkyung Lee,Myungyoon Lee,Hyojae Lim
発行日	2025-02-05 16:21:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LO | コメントを受け付けていません

Out-of-Distribution Detection using Synthetic Data Generation

投稿日: 2025年2月6日作成者: jarxiv

要約

分散システムの信頼できる展開には、分散および分布外（OOD）入力を区別することが重要です。
ただし、oodデータは通常、収集が不可能または困難であるため、正確なOOD検出のための重要な課題が発生します。
この作業では、大規模な言語モデル（LLMS）の生成機能を活用して高品質の合成OODプロキシを作成する方法を提示し、外部OODデータソースへの依存関係を排除します。
毒性の検出とセンチメント分類などの古典的なテキスト分類タスク、およびRLHFの報酬モデルのトレーニングや不整合世代の検出など、LLMの開発と展開で発生する分類タスクに関する古典的なテキスト分類タスクでの方法の有効性を研究します。
9つのInd-soodデータセットペアとさまざまなモデルサイズに関する広範な実験は、分布内のタスクの高い精度を維持しながら、誤った誤ったレートを劇的に低下させ、ベースライン方法を大幅に上回ることを示しています。

要約(オリジナル)

Distinguishing in- and out-of-distribution (OOD) inputs is crucial for reliable deployment of classification systems. However, OOD data is typically unavailable or difficult to collect, posing a significant challenge for accurate OOD detection. In this work, we present a method that harnesses the generative capabilities of Large Language Models (LLMs) to create high-quality synthetic OOD proxies, eliminating the dependency on any external OOD data source. We study the efficacy of our method on classical text classification tasks such as toxicity detection and sentiment classification as well as classification tasks arising in LLM development and deployment, such as training a reward model for RLHF and detecting misaligned generations. Extensive experiments on nine InD-OOD dataset pairs and various model sizes show that our approach dramatically lowers false positive rates (achieving a perfect zero in some cases) while maintaining high accuracy on in-distribution tasks, outperforming baseline methods by a significant margin.

arxiv情報

著者	Momin Abbas,Muneeza Azmat,Raya Horesh,Mikhail Yurochkin
発行日	2025-02-05 16:22:09+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

ECM: A Unified Electronic Circuit Model for Explaining the Emergence of In-Context Learning and Chain-of-Thought in Large Language Model

投稿日: 2025年2月6日作成者: jarxiv

要約

大規模な言語モデル（LLMS）の最近の進歩により、さまざまなアプリケーションで大幅に成功しました。最も目立つのは、特にコンテキスト学習（ICL）とチェーンの分野（COT）において、一連の新興能力に対するものです。
）。
モデルのパフォーマンスをよりよく理解して制御するために、多くの研究がこれらの現象の根本的な原因とタスクの結果への影響を調査し始めています。
ただし、既存の説明フレームワークは、主にICLとCOTの個別に分離および説明に焦点を当てており、モデルのパフォーマンスに対するそれらの組み合わせの影響を不完全に理解することにつながります。
このギャップに対処するために、電子回路モデル（ECM）を提案します。これは、スケーラブルで学習可能なポリシーを開発し、AI生成コンテンツの管理を改善するための基盤を提供します。
具体的には、ECMはモデルの動作を電子回路として概念化します。ICLは、ファラデーの法則に従って追加の電圧を提供するセマンティック磁場として表されます。一方、COTは、オームの法則に従ってモデル出力パフォーマンスを制約するためのシリーズ抵抗としてモデル化されます。
実験結果は、ECMがさまざまなプロンプト戦略にわたってLLMのパフォーマンスを効果的に予測および説明することを示しています。
さらに、ECMを、情報学の国際オリンピック（IOI）や国際数学オリンピアード（IMO）などの一連のタスクの高度な推論戦略最適化に適用し、人間のトップ競合他社の80％近くを超える競争力のあるパフォーマンスを達成します。

要約(オリジナル)

Recent advancements in large language models (LLMs) have led to significant successes across various applications, where the most noticeable is to a series of emerging capabilities, particularly in the areas of In-Context Learning (ICL) and Chain-of-Thought (CoT). To better understand and control model performance, many studies have begun investigating the underlying causes of these phenomena and their impact on task outcomes. However, existing explanatory frameworks predominantly focus on isolating and explaining ICL and CoT independently, leading to an incomplete understanding of their combined influence on model performance. To address this gap, we propose the Electronic Circuit Model (ECM), which provides a foundation for developing scalable, learnable policies and improving the management of AI-generated content. Specifically, ECM conceptualizes model behavior as an electronic circuit: ICL is represented as semantic magnetic field to providing an additional voltage following Faraday’s Law, while CoT is modeled as series resistors to constrain the model output performance following Ohm’s Law. Experimental results demonstrate that the ECM effectively predicts and explains LLM performance across a variety of prompting strategies. Furthermore, we apply ECM to advanced reasoning strategy optimization on a series of tasks, such as the International Olympiad in Informatics (IOI) and the International Mathematical Olympiad (IMO), achieving competitive performance that surpasses nearly 80% of top human competitors.

arxiv情報

著者	Qiguang Chen,Libo Qin,Jinhao Liu,Dengyun Peng,Jiaqi Wang,Mengkang Hu,Zhi Chen,Wanxiang Che,Ting Liu
発行日	2025-02-05 16:22:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント