jarxiv | Japanese arxiv

Convert Language Model into a Value-based Strategic Planner

投稿日: 2025年6月18日作成者: jarxiv

要約

感情的なサポート会話（ESC）は、効果的な会話を通じて個人の感情的な苦痛を軽減することを目指しています。
大規模な言語モデル（LLM）はESCで顕著な進歩を遂げていますが、これらの研究のほとんどは状態モデルの観点から図を定義しない可能性があるため、長期的な満足度のための最適ではないソリューションを提供します。
このような問題に対処するために、LLMSのQラーニングを活用し、STRAQ*と呼ばれるフレームワークを提案します。
当社のフレームワークにより、プラグアンドプレイLLMがESC中に計画をブートストラップし、長期リターンに基づいて最適な戦略を決定し、最後にLLMを応答するように導くことができます。
ESCデータセットでの実質的な実験は、STRAQが直接的な推論、自己記述、連鎖、微調整、および有限状態マシンを含む多くのベースラインよりも優れていることを示唆しています。

要約(オリジナル)

Emotional support conversation (ESC) aims to alleviate the emotional distress of individuals through effective conversations. Although large language models (LLMs) have obtained remarkable progress on ESC, most of these studies might not define the diagram from the state model perspective, therefore providing a suboptimal solution for long-term satisfaction. To address such an issue, we leverage the Q-learning on LLMs, and propose a framework called straQ*. Our framework allows a plug-and-play LLM to bootstrap the planning during ESC, determine the optimal strategy based on long-term returns, and finally guide the LLM to response. Substantial experiments on ESC datasets suggest that straQ* outperforms many baselines, including direct inference, self-refine, chain of thought, finetuning, and finite state machines.

arxiv情報

著者	Xiaoyu Wang,Yue Zhao,Qingqing Gu,Zhonglin Jiang,Xiaokai Chen,Yong Chen,Luo Ji
発行日	2025-06-17 15:43:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

Rigor in AI: Doing Rigorous AI Work Requires a Broader, Responsible AI-Informed Conception of Rigor

投稿日: 2025年6月18日作成者: jarxiv

要約

AIの研究と実践では、数学的、統計的、または計算方法が正しく適用されているかなど、方法論的な厳密さの観点から、厳密さが主に理解されたままです。
私たちは、この狭い策の概念は、AI能力に関する誇張された主張を含む、責任あるAIコミュニティによって提起された懸念に貢献したと主張します。
私たちの立場は、厳密なAIの研究と実践が必要なものについてのより広い概念が必要であるということです。
このような概念は、（1）方法論の厳密さのより広範な理解に加えて、（2）背景知識が何に取り組むべきか（認識論的な厳密さ）に関連する側面を含めるべきだと考えています。
（3）懲戒、コミュニティ、または個人の規範、基準、または信念がどのように仕事に影響するか（規範的な厳密さ）。
（4）使用中の理論的構成要素がどれほど明確に表現されているか（概念的な厳密さ）。
（5）報告されているものと方法（報告の厳密さ）;
（6）既存の証拠からの推論がどれほど適切にサポートされているか（解釈的な厳密さ）。
そうすることで、私たちは、研究者、政策立案者、ジャーナリスト、その他の利害関係者によるAIコミュニティの仕事に関する非常に必要な対話のための有用な言語とフレームワークを提供することも目指しています。

要約(オリジナル)

In AI research and practice, rigor remains largely understood in terms of methodological rigor — such as whether mathematical, statistical, or computational methods are correctly applied. We argue that this narrow conception of rigor has contributed to the concerns raised by the responsible AI community, including overblown claims about AI capabilities. Our position is that a broader conception of what rigorous AI research and practice should entail is needed. We believe such a conception — in addition to a more expansive understanding of (1) methodological rigor — should include aspects related to (2) what background knowledge informs what to work on (epistemic rigor); (3) how disciplinary, community, or personal norms, standards, or beliefs influence the work (normative rigor); (4) how clearly articulated the theoretical constructs under use are (conceptual rigor); (5) what is reported and how (reporting rigor); and (6) how well-supported the inferences from existing evidence are (interpretative rigor). In doing so, we also aim to provide useful language and a framework for much-needed dialogue about the AI community’s work by researchers, policymakers, journalists, and other stakeholders.

arxiv情報

著者	Alexandra Olteanu,Su Lin Blodgett,Agathe Balayn,Angelina Wang,Fernando Diaz,Flavio du Pin Calmon,Margaret Mitchell,Michael Ekstrand,Reuben Binns,Solon Barocas
発行日	2025-06-17 15:44:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CY, cs.LG | コメントを受け付けていません

Abacus: A Cost-Based Optimizer for Semantic Operator Systems

投稿日: 2025年6月18日作成者: jarxiv

要約

LLMSは、非構造化されていないドキュメントの大規模なコレクションを介して、エキサイティングな新しいクラスのデータ処理アプリケーションを有効にします。
いくつかの新しいプログラミングフレームワークにより、開発者はセマンティックオペレーターからそれらを作成することにより、これらのアプリケーションを構築できるようになりました。これは、自然言語仕様を使用したAI搭載のデータ変換の宣言セットです。
これらには、情報抽出、要約などのドキュメント処理タスクに使用されるLLM駆動のマップ、フィルター、結合などが含まれます。
セマンティックオペレーターのシステムはベンチマークで強力なパフォーマンスを達成していますが、最適化するのが難しい場合があります。
この設定のオプティマイザーは、システムをグローバルに最適化する方法で各セマンティックオペレーターを物理的に実装する方法を決定する必要があります。
既存のオプティマイザーは、適用できる最適化の数が限られており、ほとんど（すべてではないにしても）は、他の次元の制約の対象となるシステムの品質、コスト、またはレイテンシを最適化することはできません。
この論文では、（おそらく制約されている）最適化目標を考慮して、セマンティックオペレーターシステムの最良の実装を検索する、拡張可能なコストベースのオプティマイザーであるAbacusを紹介します。
Abacusは、最小限の検証例を活用することにより、オペレーターのパフォーマンスに関する以前の信念を活用することにより、オペレーターのパフォーマンスを推定します。
生物医学および法的ドメイン（BioDex; CUAD）およびマルチモーダル質問応答（MMQA）のドキュメント処理ワークロードでAbacusを評価します。
Abacusによって最適化されたシステムは、次の最高のシステムよりも18.7％-39.2％の品質と最大23.6倍の低コストと4.2倍低いレイテンシを達成することを実証します。

要約(オリジナル)

LLMs enable an exciting new class of data processing applications over large collections of unstructured documents. Several new programming frameworks have enabled developers to build these applications by composing them out of semantic operators: a declarative set of AI-powered data transformations with natural language specifications. These include LLM-powered maps, filters, joins, etc. used for document processing tasks such as information extraction, summarization, and more. While systems of semantic operators have achieved strong performance on benchmarks, they can be difficult to optimize. An optimizer for this setting must determine how to physically implement each semantic operator in a way that optimizes the system globally. Existing optimizers are limited in the number of optimizations they can apply, and most (if not all) cannot optimize system quality, cost, or latency subject to constraint(s) on the other dimensions. In this paper we present Abacus, an extensible, cost-based optimizer which searches for the best implementation of a semantic operator system given a (possibly constrained) optimization objective. Abacus estimates operator performance by leveraging a minimal set of validation examples and, if available, prior beliefs about operator performance. We evaluate Abacus on document processing workloads in the biomedical and legal domains (BioDEX; CUAD) and multi-modal question answering (MMQA). We demonstrate that systems optimized by Abacus achieve 18.7%-39.2% better quality and up to 23.6x lower cost and 4.2x lower latency than the next best system.

arxiv情報

著者	Matthew Russo,Sivaprasad Sudhir,Gerardo Vitagliano,Chunwei Liu,Tim Kraska,Samuel Madden,Michael Cafarella
発行日	2025-06-17 15:45:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.DB, H.2.4 | コメントを受け付けていません

Accurate and scalable exchange-correlation with deep learning

投稿日: 2025年6月18日作成者: jarxiv

要約

密度汎関数理論（DFT）は、分子と材料の特性を予測するために最も広く使用されている電子構造法です。
DFTは原則として、Schr \ ‘Odinger方程式の正確な再定式化ですが、実用的なアプリケーションは、不明な交換相関（XC）機能の近似に依存しています。
ほとんどの既存のXC機能は、計算効率を犠牲にして精度を向上させる、ますます複雑で手作りされた機能の限られたセットを使用して構築されています。
しかし、現在の近似は、化学的精度で実験室実験の予測モデリングの精度と一般性を達成するものではありません。通常、1 kcal/mol未満のエラーとして定義されます。
この作業では、データから直接学習表現によって高価な手で設計された機能をバイパスする最新の深い学習ベースのXC機能であるSkalaを提示します。
Skalaは、半ローカルDFTに典型的な計算効率を保持しながら、小分子の霧化エネルギーの化学的精度を達成します。
このパフォーマンスは、計算集中的な波動関数ベースの方法を使用して生成された前例のない量の高精度参照データをトレーニングすることにより有効になります。
特に、Skalaは、多様な化学をカバーする追加のトレーニングデータで体系的に改善します。
スカラは、霧化エネルギーを超えて化学に合わせて調整された追加の追加の高精度データを組み込むことにより、半ローカルなDFTを犠牲にして、一般的なメイングループ化学全体で最高のパフォーマンスのハイブリッド機能と競合する精度を達成します。
トレーニングデータセットが拡大し続けるにつれて、Skalaは第一原理シミュレーションの予測力をさらに強化する態勢を整えています。

要約(オリジナル)

Density Functional Theory (DFT) is the most widely used electronic structure method for predicting the properties of molecules and materials. Although DFT is, in principle, an exact reformulation of the Schr\’odinger equation, practical applications rely on approximations to the unknown exchange-correlation (XC) functional. Most existing XC functionals are constructed using a limited set of increasingly complex, hand-crafted features that improve accuracy at the expense of computational efficiency. Yet, no current approximation achieves the accuracy and generality for predictive modeling of laboratory experiments at chemical accuracy — typically defined as errors below 1 kcal/mol. In this work, we present Skala, a modern deep learning-based XC functional that bypasses expensive hand-designed features by learning representations directly from data. Skala achieves chemical accuracy for atomization energies of small molecules while retaining the computational efficiency typical of semi-local DFT. This performance is enabled by training on an unprecedented volume of high-accuracy reference data generated using computationally intensive wavefunction-based methods. Notably, Skala systematically improves with additional training data covering diverse chemistry. By incorporating a modest amount of additional high-accuracy data tailored to chemistry beyond atomization energies, Skala achieves accuracy competitive with the best-performing hybrid functionals across general main group chemistry, at the cost of semi-local DFT. As the training dataset continues to expand, Skala is poised to further enhance the predictive power of first-principles simulations.

arxiv情報

著者	Giulia Luise,Chin-Wei Huang,Thijs Vogels,Derk P. Kooi,Sebastian Ehlert,Stephanie Lanius,Klaas J. H. Giesbertz,Amir Karton,Deniz Gunceler,Megan Stanley,Wessel P. Bruinsma,Lin Huang,Xinran Wei,José Garrido Torres,Abylay Katbashev,Bálint Máté,Sékou-Oumar Kaba,Roberto Sordillo,Yingrong Chen,David B. Williams-Young,Christopher M. Bishop,Jan Hermann,Rianne van den Berg,Paola Gori-Giorgi
発行日	2025-06-17 15:56:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CE, cs.LG, physics.chem-ph, physics.comp-ph | コメントを受け付けていません

StreetLens: Enabling Human-Centered AI Agents for Neighborhood Assessment from Street View Imagery

投稿日: 2025年6月18日作成者: jarxiv

要約

伝統的に、近隣の研究では、身体障害、腐敗、街路安全、社会文化的シンボルなどの環境特性を特定し、発達および健康結果への影響を調べるために、詳細なプロトコルによって導かれたインタビュー、調査、および手動画像注釈を採用してきました。
これらの方法は豊富な洞察をもたらしますが、時間がかかり、集中的な専門家の介入が必要です。
ビジョン言語モデル（VLM）を含む最近の技術の進歩は、このプロセスの一部を自動化し始めています。
ただし、既存の取り組みは多くの場合、アドホックであり、研究デザインと地理的コンテキスト全体で適応性が欠けています。
このデモペーパーでは、スケーラブルな近隣環境評価のためにVLMに関連する社会科学の専門知識を埋め込む、人間中心の研究者で構成可能なワークフローであるStreetlensを紹介します。
Streetlensは、確立されたインタビュープロトコルから派生した質問に分析を接地し、関連するストリートビュー画像（SVI）を取得し、客観的な特徴（たとえば、車の数）から主観的な評価（例えば、障害の感覚）から幅広いセマンティック注釈を生成することにより、訓練された人間のコーダーのプロセスを模倣します。
研究者がドメインに基づいたプロンプトを通じてVLMの役割を定義できるようにすることにより、StreetLensは分析プロセスの中核にドメインの知識を配置します。
また、以前の調査データの統合をサポートして、堅牢性を高め、多様な設定で評価される特性の範囲を拡大します。
Streetlensがアクセスしやすく、公開またはカスタムSVIデータセットを扱う研究者が拡張可能にするためのGoogle Colabノートブックを提供しています。
Streetlensは、柔軟なエージェントAIシステムへのシフトを表しています。これは、研究者と緊密に連携して近隣の研究を加速および拡大することです。

要約(オリジナル)

Traditionally, neighborhood studies have employed interviews, surveys, and manual image annotation guided by detailed protocols to identify environmental characteristics, including physical disorder, decay, street safety, and sociocultural symbols, and to examine their impact on developmental and health outcomes. While these methods yield rich insights, they are time-consuming and require intensive expert intervention. Recent technological advances, including vision-language models (VLMs), have begun to automate parts of this process; however, existing efforts are often ad hoc and lack adaptability across research designs and geographic contexts. In this demo paper, we present StreetLens, a human-centered, researcher-configurable workflow that embeds relevant social science expertise in a VLM for scalable neighborhood environmental assessments. StreetLens mimics the process of trained human coders by grounding the analysis in questions derived from established interview protocols, retrieving relevant street view imagery (SVI), and generating a wide spectrum of semantic annotations from objective features (e.g., the number of cars) to subjective perceptions (e.g., the sense of disorder in an image). By enabling researchers to define the VLM’s role through domain-informed prompting, StreetLens places domain knowledge at the core of the analysis process. It also supports the integration of prior survey data to enhance robustness and expand the range of characteristics assessed across diverse settings. We provide a Google Colab notebook to make StreetLens accessible and extensible for researchers working with public or custom SVI datasets. StreetLens represents a shift toward flexible, agentic AI systems that work closely with researchers to accelerate and scale neighborhood studies.

arxiv情報

著者	Jina Kim,Leeje Jang,Yao-Yi Chiang,Guanyu Wang,Michelle Pasco
発行日	2025-06-17 16:06:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.HC | コメントを受け付けていません

Design an Editable Speech-to-Sign-Language Transformer System: A Human-Centered AI Approach

投稿日: 2025年6月18日作成者: jarxiv

要約

このペーパーでは、トランスベースのモーション生成を透明でユーザーエディュなJSON中級レイヤーと統合する、人間中心のリアルタイム、ユーザー適応スピーチと署名言語アニメーションシステムを紹介します。
このフレームワークは、ユーザーの直接検査と標識セグメントの変更を可能にすることにより、以前の手話テクノロジーの重要な制限を克服し、したがって、自然性、表現力、およびユーザー機関を高めます。
ストリーミングコンフォーマーエンコーダーとオートレーフレフなトランスMDNデコーダーを活用して、システムは音声入力を上半身に同期し、3Dアバターレンダリングの顔の動きを同期させます。
編集とユーザーの評価は、継続的な改善のために、人間のループ最適化ループに供給されます。
20人の聴覚障害者署名者と5人の通訳者を使用した実験は、編集可能なインターフェースと参加型フィードバックが、認知負荷を下げながら、理解、自然性、使いやすさ、信頼を大幅に改善することを示しています。
標準のハードウェアに20ミリ秒のフレームごとの推論を使用すると、システムはリアルタイムのコミュニケーションと教育の準備が整いました。
この作業は、技術的および参加型の革新が、手話テクノロジーのためのアクセス可能で説明可能な、ユーザー適応性のあるAIをどのように可能にするかを示しています。

要約(オリジナル)

This paper presents a human-centered, real-time, user-adaptive speech-to-sign language animation system that integrates Transformer-based motion generation with a transparent, user-editable JSON intermediate layer. The framework overcomes key limitations in prior sign language technologies by enabling direct user inspection and modification of sign segments, thus enhancing naturalness, expressiveness, and user agency. Leveraging a streaming Conformer encoder and autoregressive Transformer-MDN decoder, the system synchronizes spoken input into upper-body and facial motion for 3D avatar rendering. Edits and user ratings feed into a human-in-the-loop optimization loop for continuous improvement. Experiments with 20 deaf signers and 5 interpreters show that the editable interface and participatory feedback significantly improve comprehension, naturalness, usability, and trust, while lowering cognitive load. With sub-20 ms per-frame inference on standard hardware, the system is ready for real-time communication and education. This work illustrates how technical and participatory innovation together enable accessible, explainable, and user-adaptive AI for sign language technology.

arxiv情報

著者	Yingchao Li
発行日	2025-06-17 16:08:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.HC | コメントを受け付けていません

A Conjecture on a Fundamental Trade-Off between Certainty and Scope in Symbolic and Generative AI

投稿日: 2025年6月18日作成者: jarxiv

要約

この記事では、人工知能（AI）システムにおける証明可能な正確性と広範なデータマッピング容量の間の基本的なトレードオフを形式化する推測を紹介します。
AIシステムが控えめに水密保証のために設計されている場合（その出力のエラーのない性質についての実証可能な確実性） – 古典的な象徴的なAIのように、その運用ドメインは狭く囲まれ、事前に構造化されなければなりません。
逆に、現代の生成モデルのように、高次元データを入力して豊富な情報出力を生成できるシステムは、必然的にエラーのパフォーマンスの可能性を放棄し、既確のエラーまたは誤分類のリスクを負担します。
これが以前に暗黙のトレードオフを明示的かつ厳密な検証に開放することにより、推測はAIに対するエンジニアリングの野望と哲学的期待の両方を大幅に再構成します。
この緊張の歴史的動機をレビューした後、この記事では、情報理論形式の推測を述べ、認識論、正式な検証、および技術哲学に関するより広範な議論の中でそれを文脈化します。
次に、その意味と結果の分析を提供し、未定の概念、慎重な認識論的リスク、道徳的責任を引き付けます。
この議論では、正しい場合、推測が評価基準、ガバナンスフレームワーク、ハイブリッドシステムの設計をどのように再構築するのに役立つかを明確にします。
結論は、信頼できるAIの将来の不平等を最終的に証明または反論することの重要性を強調しています。

要約(オリジナル)

This article introduces a conjecture that formalises a fundamental trade-off between provable correctness and broad data-mapping capacity in Artificial Intelligence (AI) systems. When an AI system is engineered for deductively watertight guarantees (demonstrable certainty about the error-free nature of its outputs) — as in classical symbolic AI — its operational domain must be narrowly circumscribed and pre-structured. Conversely, a system that can input high-dimensional data to produce rich information outputs — as in contemporary generative models — necessarily relinquishes the possibility of zero-error performance, incurring an irreducible risk of errors or misclassification. By making this previously implicit trade-off explicit and open to rigorous verification, the conjecture significantly reframes both engineering ambitions and philosophical expectations for AI. After reviewing the historical motivations for this tension, the article states the conjecture in information-theoretic form and contextualises it within broader debates in epistemology, formal verification, and the philosophy of technology. It then offers an analysis of its implications and consequences, drawing on notions of underdetermination, prudent epistemic risk, and moral responsibility. The discussion clarifies how, if correct, the conjecture would help reshape evaluation standards, governance frameworks, and hybrid system design. The conclusion underscores the importance of eventually proving or refuting the inequality for the future of trustworthy AI.

arxiv情報

著者	Luciano Floridi
発行日	2025-06-17 16:13:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI | コメントを受け付けていません

Unified Software Engineering agent as AI Software Engineer

投稿日: 2025年6月18日作成者: jarxiv

要約

大規模な言語モデル（LLM）テクノロジーの成長により、自動コーディングへの期待が高まりました。
ただし、ソフトウェアエンジニアリングはコーディング以上のものであり、プロジェクトのメンテナンスや進化などのアクティビティに関係しています。
これに関連して、LLMエージェントの概念は牽引力を獲得しました。これは、LLMSを推論エンジンとして利用して、外部ツールを自律的に呼び出すことです。
しかし、LLMエージェントはAIソフトウェアエンジニアと同じですか？
この論文では、統一されたソフトウェアエンジニアリングエージェントまたはUSEAGENTを開発することにより、この質問を理解しようとしています。
テスト、デバッグ、修理などの特定のソフトウェアタスクの専門的なエージェントを構築する既存の作業とは異なり、私たちの目標は、複数の機能を調整して処理できる統一エージェントを構築することです。
これにより、エージェントは、不完全なパッチの修正、新しい機能の追加、他の人が書いたコードを引き継ぐなど、ソフトウェア開発における複雑なシナリオを処理することを約束します。
USEAGENTは、AIと人間の両方が関与する将来のソフトウェア開発チームのチームメンバーになることができる将来のAIソフトウェアエンジニアの最初のドラフトとして想定しています。
USEAGENTの有効性を評価するために、コーディング、テスト、パッチングなどの無数のタスクで構成される統一ソフトウェアエンジニアリングベンチ（使用ベンチ）を構築します。
UseBenchは、SWEベンチ、SWTベンチ、RepoCodなどの既存のベンチマークからのタスクの賢明な混合物です。
1,271のリポジトリレベルのソフトウェアエンジニアリングタスクで構成されるUseBenchの評価では、USEAGENTは、OpenHands Codeactagentなどの既存の一般的なエージェントと比較して有効性が改善されていることを示しています。
特定のコーディングタスクのUSEAGENTの機能にはギャップが存在します。これは、将来のAIソフトウェアエンジニアをさらに開発することに関するヒントを提供します。

要約(オリジナル)

The growth of Large Language Model (LLM) technology has raised expectations for automated coding. However, software engineering is more than coding and is concerned with activities including maintenance and evolution of a project. In this context, the concept of LLM agents has gained traction, which utilize LLMs as reasoning engines to invoke external tools autonomously. But is an LLM agent the same as an AI software engineer? In this paper, we seek to understand this question by developing a Unified Software Engineering agent or USEagent. Unlike existing work which builds specialized agents for specific software tasks such as testing, debugging, and repair, our goal is to build a unified agent which can orchestrate and handle multiple capabilities. This gives the agent the promise of handling complex scenarios in software development such as fixing an incomplete patch, adding new features, or taking over code written by others. We envision USEagent as the first draft of a future AI Software Engineer which can be a team member in future software development teams involving both AI and humans. To evaluate the efficacy of USEagent, we build a Unified Software Engineering bench (USEbench) comprising of myriad tasks such as coding, testing, and patching. USEbench is a judicious mixture of tasks from existing benchmarks such as SWE-bench, SWT-bench, and REPOCOD. In an evaluation on USEbench consisting of 1,271 repository-level software engineering tasks, USEagent shows improved efficacy compared to existing general agents such as OpenHands CodeActAgent. There exist gaps in the capabilities of USEagent for certain coding tasks, which provides hints on further developing the AI Software Engineer of the future.

arxiv情報

著者	Leonhard Applis,Yuntong Zhang,Shanchao Liang,Nan Jiang,Lin Tan,Abhik Roychoudhury
発行日	2025-06-17 16:19:13+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.SE | コメントを受け付けていません

Agent Laboratory: Using LLM Agents as Research Assistants

投稿日: 2025年6月18日作成者: jarxiv

要約

歴史的に、科学的発見は長くて費用のかかるプロセスであり、最初の概念から最終結果まで、かなりの時間とリソースを要求していました。
科学的発見を加速し、研究コストを削減し、研究の質を向上させるために、研究プロセス全体を完了できる自律LLMベースのフレームワークであるAgent Laboratoryを紹介します。
このフレームワークは、人間が提供する研究のアイデアを受け入れ、3つの段階を通じて進歩します – 文学的レビュー、実験、レポートライティングは、コードリポジトリや調査レポートを含む包括的な研究アウトプットを作成し、各段階でフィードバックとガイダンスを提供できるようにします。
さまざまな最先端のLLMを導入し、調査に参加し、研究プロセスを導くための人間のフィードバックを提供し、最終論文を評価することにより、複数の研究者にその品質を評価するよう招待します。
（1）O1-Previewが駆動するエージェント研究所は、最高の研究成果を生成します。
（2）生成された機械学習コードは、既存の方法と比較して最先端のパフォーマンスを実現できます。
（3）各段階でフィードバックを提供する人間の関与は、研究の全体的な質を大幅に向上させる。
（4）エージェント研究所は研究費用を大幅に削減し、以前の自律的な研究方法と比較して84％の減少を達成しました。
エージェント研究所が、研究者が低レベルのコーディングと執筆ではなく、創造的なアイデアに向けてより多くの努力を割り当てることができることを願っています。

要約(オリジナル)

Historically, scientific discovery has been a lengthy and costly process, demanding substantial time and resources from initial conception to final results. To accelerate scientific discovery, reduce research costs, and improve research quality, we introduce Agent Laboratory, an autonomous LLM-based framework capable of completing the entire research process. This framework accepts a human-provided research idea and progresses through three stages–literature review, experimentation, and report writing to produce comprehensive research outputs, including a code repository and a research report, while enabling users to provide feedback and guidance at each stage. We deploy Agent Laboratory with various state-of-the-art LLMs and invite multiple researchers to assess its quality by participating in a survey, providing human feedback to guide the research process, and then evaluate the final paper. We found that: (1) Agent Laboratory driven by o1-preview generates the best research outcomes; (2) The generated machine learning code is able to achieve state-of-the-art performance compared to existing methods; (3) Human involvement, providing feedback at each stage, significantly improves the overall quality of research; (4) Agent Laboratory significantly reduces research expenses, achieving an 84% decrease compared to previous autonomous research methods. We hope Agent Laboratory enables researchers to allocate more effort toward creative ideation rather than low-level coding and writing, ultimately accelerating scientific discovery.

arxiv情報

著者	Samuel Schmidgall,Yusheng Su,Ze Wang,Ximeng Sun,Jialian Wu,Xiaodong Yu,Jiang Liu,Michael Moor,Zicheng Liu,Emad Barsoum
発行日	2025-06-17 16:19:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.HC, cs.LG | コメントを受け付けていません

Refining music sample identification with a self-supervised graph neural network

投稿日: 2025年6月18日作成者: jarxiv

要約

自動サンプル識別（ASID）、新しい音楽作品で再利用されたオーディオ録音の一部の検出と識別は、オーディオクエリベースの検索の分野で不可欠ではあるが挑戦的なタスクです。
関連するタスクであるオーディオフィンガープリントは、「リアルワールド」（騒々しい、反響）条件下で音楽コンテンツを正確に取得することで大きな進歩を遂げましたが、ASIDシステムは音楽の修正を受けたサンプルを特定するのに苦労しています。
したがって、タイムストレッチング、ピッチシフト、エフェクト処理、根本的またはオーバーレイ音楽などの一般的な音楽制作の変換に堅牢なシステムは、重要なオープンな課題です。
この作業では、対照的な学習フレームワーク内でグラフニューラルネットワークを使用した軽量でスケーラブルなエンコーディングアーキテクチャを提案します。
私たちのモデルは、同等のパフォーマンスを達成しながら、現在の最先端のシステムと比較してトレーニング可能なパラメーターの9％のみを使用し、平均平均精度（MAP）に44.2％に達します。
検索の品質を向上させるために、候補選択の初期の粗い類似性検索で構成される2段階のアプローチを導入し、その後、無関係な一致を拒否し、検索された候補者のランキングを改良するクロスアテンション分類器を紹介します。
さらに、現実世界のアプリケーションのクエリは多くの場合、期間が短いため、Sample100データセットの新しいファイングレインアノテーションを使用して短いクエリのシステムをベンチマークします。これは、この作業の一部として公開します。

要約(オリジナル)

Automatic sample identification (ASID), the detection and identification of portions of audio recordings that have been reused in new musical works, is an essential but challenging task in the field of audio query-based retrieval. While a related task, audio fingerprinting, has made significant progress in accurately retrieving musical content under ‘real world’ (noisy, reverberant) conditions, ASID systems struggle to identify samples that have undergone musical modifications. Thus, a system robust to common music production transformations such as time-stretching, pitch-shifting, effects processing, and underlying or overlaying music is an important open challenge. In this work, we propose a lightweight and scalable encoding architecture employing a Graph Neural Network within a contrastive learning framework. Our model uses only 9% of the trainable parameters compared to the current state-of-the-art system while achieving comparable performance, reaching a mean average precision (mAP) of 44.2%. To enhance retrieval quality, we introduce a two-stage approach consisting of an initial coarse similarity search for candidate selection, followed by a cross-attention classifier that rejects irrelevant matches and refines the ranking of retrieved candidates – an essential capability absent in prior models. In addition, because queries in real-world applications are often short in duration, we benchmark our system for short queries using new fine-grained annotations for the Sample100 dataset, which we publish as part of this work.

arxiv情報

著者	Aditya Bhattacharjee,Ivan Meresman Higgs,Mark Sandler,Emmanouil Benetos
発行日	2025-06-17 16:19:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.IR, cs.SD, H.5.5 | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント