jarxiv | Japanese arxiv | ページ 184

Intentionally Unintentional: GenAI Exceptionalism and the First Amendment

投稿日: 2025年6月6日作成者: jarxiv

要約

この論文は、GPT-4やGeminiなどの大規模な生成AIモデルからの出力に対して、裁判所が修正第1条の保護を付与すべきであるという仮定に挑戦しています。
これらのモデルは意図性を欠いているため、確立された法的先例の文脈で理解されるように、その出力はスピーチを構成しないため、保護するスピーチはありえないと主張します。
さらに、モデル出力がスピーチではない場合、ユーザーは出力を受信するための修正第1条のスピーチ権を請求することはできません。
また、AIモデルへの修正第1条の権利を拡大することは、アイデアの市場を促進したり、自治の促進、自己表現を促進するなど、言論の自由の基本的な目的に役立たないと主張しています。
実際、AIモデルに最初の修正保護を付与することは、これらの強力な技術を効果的に規制する政府の能力を妨げ、誤報やその他の害の未確認の広がりにつながる可能性があるため、社会にとって有害です。

要約(オリジナル)

This paper challenges the assumption that courts should grant First Amendment protections to outputs from large generative AI models, such as GPT-4 and Gemini. We argue that because these models lack intentionality, their outputs do not constitute speech as understood in the context of established legal precedent, so there can be no speech to protect. Furthermore, if the model outputs are not speech, users cannot claim a First Amendment speech right to receive the outputs. We also argue that extending First Amendment rights to AI models would not serve the fundamental purposes of free speech, such as promoting a marketplace of ideas, facilitating self-governance, or fostering self-expression. In fact, granting First Amendment protections to AI models would be detrimental to society because it would hinder the government’s ability to regulate these powerful technologies effectively, potentially leading to the unchecked spread of misinformation and other harms.

arxiv情報

著者	David Atkinson,Jena D. Hwang,Jacob Morrison
発行日	2025-06-05 16:26:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CY | コメントを受け付けていません

LLM-First Search: Self-Guided Exploration of the Solution Space

投稿日: 2025年6月6日作成者: jarxiv

要約

大規模な言語モデル（LLMS）は、多くの場合、問題解決を検索プロセスとしてフレーミングすることにより、テスト時間計算の増加により、推論と計画の顕著な改善を実証しています。
モンテカルロツリー検索（MCTS）のような方法はいくつかのドメインで効果的であることが証明されていますが、固定探査ハイパーパラメータへの依存により、さまざまな難易度のタスク全体に適応性が制限され、特定の設定では非実用的または高価になります。
この論文では、\ textbf {llm-first search（lfs）}を提案します。これは、自己誘導探索を介して検索プロセスを自律的に制御できるように、事前定義された検索戦略の必要性を削除する、新しい\ textit {llm self-gided search}メソッドです。
LLMは、外部のヒューリスティックまたはハードコードされたポリシーに依存するのではなく、現在の検索パスを追求するか、内部スコアリングメカニズムに基づいて代替ブランチを探索するかを評価します。
これにより、手動のチューニングやタスク固有の適応を必要とせずに、より柔軟でコンテキストに敏感な推論が可能になります。
カウントダウンのLFSを、3つのクラシックに広く使用されている検索アルゴリズム、ツリーオブオブセーブの幅の最初の検索（TOT-BFS）、ベストファースト検索（BESTFS）、およびMCTに対してLFSを評価します。これらは、それぞれが、困難な推論の範囲でSOTA結果を達成するために使用されています。
LFS（1）は、追加のチューニングなしでより挑戦的なタスクでパフォーマンスを向上させることがわかりました。（2）他のメソッドと比較して、特に強力なモデルで駆動される場合、（3）LLMファーストデザインのため、より強力なモデルでより良いスケーリングを行うことがわかりました。
私たちのコードは、\ href {https://github.com/nathanherr/llm-first-search} {llm-first-search}で公開されています。

要約(オリジナル)

Large Language Models (LLMs) have demonstrated remarkable improvements in reasoning and planning through increased test-time compute, often by framing problem-solving as a search process. While methods like Monte Carlo Tree Search (MCTS) have proven effective in some domains, their reliance on fixed exploration hyperparameters limits their adaptability across tasks of varying difficulty, rendering them impractical or expensive in certain settings. In this paper, we propose \textbf{LLM-First Search (LFS)}, a novel \textit{LLM Self-Guided Search} method that removes the need for pre-defined search strategies by empowering the LLM to autonomously control the search process via self-guided exploration. Rather than relying on external heuristics or hardcoded policies, the LLM evaluates whether to pursue the current search path or explore alternative branches based on its internal scoring mechanisms. This enables more flexible and context-sensitive reasoning without requiring manual tuning or task-specific adaptation. We evaluate LFS on Countdown and Sudoku against three classic widely-used search algorithms, Tree-of-Thoughts’ Breadth First Search (ToT-BFS), Best First Search (BestFS), and MCTS, each of which have been used to achieve SotA results on a range of challenging reasoning tasks. We found that LFS (1) performs better on more challenging tasks without additional tuning, (2) is more computationally efficient compared to the other methods, especially when powered by a stronger model, (3) scales better with stronger models, due to its LLM-First design, and (4) scales better with increased compute budget. Our code is publicly available at \href{https://github.com/NathanHerr/LLM-First-Search}{LLM-First-Search}.

arxiv情報

著者	Nathan Herr,Tim Rocktäschel,Roberta Raileanu
発行日	2025-06-05 16:27:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

Mitigating Degree Bias Adaptively with Hard-to-Learn Nodes in Graph Contrastive Learning

投稿日: 2025年6月6日作成者: jarxiv

要約

グラフニューラルネットワーク（GNNS）は、多くの場合、ノード分類タスクの程度のバイアスに悩まされます。この場合、予測のパフォーマンスは程度が異なるノード間で異なります。
グラフ対照学習（GCL）を採用するいくつかのアプローチが、このバイアスを軽減するために提案されています。
ただし、GCLのすべてのポジティブおよびネガの限られた数のポジティブペアと等しい重み付けは、依然として不十分でノイズの多い情報を取得する低級ノードにつながります。
このペーパーでは、硬度適応性のある再重弁（HAR）の対照的な損失を緩和することを提案します。
ノードラベルを活用し、学習の硬度に基づいて正のペアとネガティブペアを適応的に重み付けすることにより、より正のペアを追加します。
さらに、Sharpという名前の実験的なフレームワークを開発して、HARをより広範なシナリオに拡張します。
理論分析と実験の両方が、シャープの有効性を検証します。
4つのデータセットにわたる実験結果は、Sharpがグローバルレベルと学位レベルの両方でベースラインに対してより良いパフォーマンスを達成することを示しています。

要約(オリジナル)

Graph Neural Networks (GNNs) often suffer from degree bias in node classification tasks, where prediction performance varies across nodes with different degrees. Several approaches, which adopt Graph Contrastive Learning (GCL), have been proposed to mitigate this bias. However, the limited number of positive pairs and the equal weighting of all positives and negatives in GCL still lead to low-degree nodes acquiring insufficient and noisy information. This paper proposes the Hardness Adaptive Reweighted (HAR) contrastive loss to mitigate degree bias. It adds more positive pairs by leveraging node labels and adaptively weights positive and negative pairs based on their learning hardness. In addition, we develop an experimental framework named SHARP to extend HAR to a broader range of scenarios. Both our theoretical analysis and experiments validate the effectiveness of SHARP. The experimental results across four datasets show that SHARP achieves better performance against baselines at both global and degree levels.

arxiv情報

著者	Jingyu Hu,Hongbo Bo,Jun Hong,Xiaowei Liu,Weiru Liu
発行日	2025-06-05 16:28:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

The Lessons of Developing Process Reward Models in Mathematical Reasoning

投稿日: 2025年6月6日作成者: jarxiv

要約

プロセス報酬モデル（PRM）は、推論プロセスで中間エラーを特定して軽減することを目的とした、大規模な言語モデル（LLM）の数学的推論におけるプロセス監督のための有望なアプローチとして現れます。
ただし、効果的なPRMSの開発は、特にデータアノテーションと評価方法論において、重大な課題に直面しています。
この論文では、広範な実験を通じて、PRMSの一般的に使用されるモンテカルロ（MC）推定ベースのデータ合成は、通常、LLM-As-a-a-judgeおよび人間の注釈法と比較して劣ったパフォーマンスと一般化をもたらすことを実証します。
MC推定は、現在の段階の正確性を評価するために完了モデルに依存しており、不正確なステップ検証につながります。
さらに、PRMSの従来のベスト-N（Bon）評価戦略の潜在的なバイアスを特定します。（1）信頼性の低いポリシーモデルは、正解と欠陥のあるプロセスで応答を生成し、BONの評価基準とPRMのプロセス検証の目的との間の不整列につながります。
（2）そのような応答のPRMSの耐性は、膨らんだボンスコアにつながります。
（3）既存のPRMSには、最終的な回答ステップに集中した最小スコアのかなりの割合があり、BON最適化されたPRMSのプロセスに基づく評価への移行が明らかになりました。
これらの課題に対処するために、MC推定をLLM-A-A-Judgeと効果的に統合するコンセンサスフィルタリングメカニズムを開発し、応答レベルとステップレベルのメトリックを組み合わせたより包括的な評価フレームワークを提唱します。
メカニズムに基づいて、Bon評価のモデルパフォーマンスとデータ効率と段階的なエラー識別タスクの両方を大幅に改善します。
最後に、既存のオープンソースの代替案よりも優れた新しい最先端のPRMをリリースし、構築プロセス監督モデルの将来の研究のための実用的なガイドラインを提供します。

要約(オリジナル)

Process Reward Models (PRMs) emerge as a promising approach for process supervision in mathematical reasoning of Large Language Models (LLMs), which aim to identify and mitigate intermediate errors in the reasoning processes. However, the development of effective PRMs faces significant challenges, particularly in data annotation and evaluation methodologies. In this paper, through extensive experiments, we demonstrate that commonly used Monte Carlo (MC) estimation-based data synthesis for PRMs typically yields inferior performance and generalization compared to LLM-as-a-judge and human annotation methods. MC estimation relies on completion models to evaluate current-step correctness, leading to inaccurate step verification. Furthermore, we identify potential biases in conventional Best-of-N (BoN) evaluation strategies for PRMs: (1) The unreliable policy models generate responses with correct answers but flawed processes, leading to a misalignment between the evaluation criteria of BoN and the PRM objectives of process verification. (2) The tolerance of PRMs of such responses leads to inflated BoN scores. (3) Existing PRMs have a significant proportion of minimum scores concentrated on the final answer steps, revealing the shift from process to outcome-based assessment in BoN Optimized PRMs. To address these challenges, we develop a consensus filtering mechanism that effectively integrates MC estimation with LLM-as-a-judge and advocates a more comprehensive evaluation framework that combines response-level and step-level metrics. Based on the mechanisms, we significantly improve both model performance and data efficiency in the BoN evaluation and the step-wise error identification task. Finally, we release a new state-of-the-art PRM that outperforms existing open-source alternatives and provides practical guidelines for future research in building process supervision models.

arxiv情報

著者	Zhenru Zhang,Chujie Zheng,Yangzhen Wu,Beichen Zhang,Runji Lin,Bowen Yu,Dayiheng Liu,Jingren Zhou,Junyang Lin
発行日	2025-06-05 16:34:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

Revisiting 3D LLM Benchmarks: Are We Really Testing 3D Capabilities?

投稿日: 2025年6月6日作成者: jarxiv

要約

この作業では、3D LLM評価における「2Dチーティング」問題を特定します。ここでは、これらのタスクは、ポイントクラウドのレンダリングされた画像を使用してVLMSによって簡単に解決され、3D LLMSのユニークな3D機能の効果的な評価を公開します。
複数の3D LLMベンチマークにわたってVLMパフォーマンスをテストし、これを参照として使用して、本物の3D理解をより適切に評価するための原則を提案します。
また、3D LLMを評価する際に、3D能力を1Dまたは2Dの側面から明示的に分離することも提唱しています。
コードとデータは、https：//github.com/llm-class-group/revisiting-3d-llm-benchmarksで入手できます。

要約(オリジナル)

In this work, we identify the ‘2D-Cheating’ problem in 3D LLM evaluation, where these tasks might be easily solved by VLMs with rendered images of point clouds, exposing ineffective evaluation of 3D LLMs’ unique 3D capabilities. We test VLM performance across multiple 3D LLM benchmarks and, using this as a reference, propose principles for better assessing genuine 3D understanding. We also advocate explicitly separating 3D abilities from 1D or 2D aspects when evaluating 3D LLMs. Code and data are available at https://github.com/LLM-class-group/Revisiting-3D-LLM-Benchmarks .

arxiv情報

著者	Jiahe Jin,Yanheng He,Mingyan Yang
発行日	2025-06-05 16:35:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI | コメントを受け付けていません

MesaNet: Sequence Modeling by Locally Optimal Test-Time Training

投稿日: 2025年6月6日作成者: jarxiv

要約

シーケンスモデリングは現在、SoftMaxの自己触媒を使用する因果変圧器アーキテクチャによって支配されています。
広く採用されていますが、変圧器はスケーリングメモリを必要とし、推論中に直線的に計算します。
最近の作業ストリームは、SoftMax操作を線形化し、Deltanet、Mamba、XLSTMなどの一定のメモリと計算コストを備えた強力な再発性ニューラルネットワーク（RNN）モデルをもたらしました。
これらのモデルは、それらの再発層のダイナミクスがすべて、オンライン学習ルールを通じてほぼ最適化されたコンテスト内回帰目標から派生できることに注意することで統合できます。
ここでは、この作業ラインに参加し、最近提案されたMESA層（Von Oswald et al。、2024）の数値的に安定した塊状の並列化可能なバージョンを導入し、10億パラメータースケールで言語モデリングで研究します。
この層は再びコンテキスト内の損失に起因しますが、高速コンジュゲート勾配ソルバーを使用して、すべての時点で最適化されるようになりました。
広範な一連の実験を通じて、最適なテスト時間トレーニングにより、特に長いコンテキストの理解を必要とするタスクで、以前のRNNよりも低い言語モデリングの困惑と下流のベンチマークパフォーマンスに到達することができることを示しています。
このパフォーマンスのゲインは、推論時間中に追加のフロップを費やした費用がかかります。
したがって、私たちの結果は、テスト時間計算の増加の最近の傾向に興味深く関連しています。ここでは、ニューラルネットワーク自体内の連続的な最適化問題を解決するために計算を費やすことにより。

要約(オリジナル)

Sequence modeling is currently dominated by causal transformer architectures that use softmax self-attention. Although widely adopted, transformers require scaling memory and compute linearly during inference. A recent stream of work linearized the softmax operation, resulting in powerful recurrent neural network (RNN) models with constant memory and compute costs such as DeltaNet, Mamba or xLSTM. These models can be unified by noting that their recurrent layer dynamics can all be derived from an in-context regression objective, approximately optimized through an online learning rule. Here, we join this line of work and introduce a numerically stable, chunkwise parallelizable version of the recently proposed Mesa layer (von Oswald et al., 2024), and study it in language modeling at the billion-parameter scale. This layer again stems from an in-context loss, but which is now minimized to optimality at every time point using a fast conjugate gradient solver. Through an extensive suite of experiments, we show that optimal test-time training enables reaching lower language modeling perplexity and higher downstream benchmark performance than previous RNNs, especially on tasks requiring long context understanding. This performance gain comes at the cost of additional flops spent during inference time. Our results are therefore intriguingly related to recent trends of increasing test-time compute to improve performance — here by spending compute to solve sequential optimization problems within the neural network itself.

arxiv情報

著者	Johannes von Oswald,Nino Scherrer,Seijin Kobayashi,Luca Versari,Songlin Yang,Maximilian Schlegel,Kaitlin Maile,Yanick Schimpf,Oliver Sieberling,Alexander Meulemans,Rif A. Saurous,Guillaume Lajoie,Charlotte Frenkel,Razvan Pascanu,Blaise Agüera y Arcas,João Sacramento
発行日	2025-06-05 16:50:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

Rethinking LLM Advancement: Compute-Dependent and Independent Paths to Progress

投稿日: 2025年6月6日作成者: jarxiv

要約

大規模な言語モデル（LLM）開発を管理する規制の取り組みは、主に高性能計算リソースへのアクセスを制限することに焦点を当てています。
この研究では、Compute-Conscresed環境でのアルゴリズムイノベーションを通じてLLM機能が進むことができるかどうかを調べることにより、このような測定の有効性を評価します。
コンピューティングスケール全体の効率を向上させるコンピューティングに依存しないイノベーションを際立たせる新しいフレームワークを、高い計算で不均衡な利益をもたらすコンピューティング依存のイノベーションを区別することを提案します。
影響は、計算等価ゲイン（CEG）を使用して定量化されます。
NANOGPTモデルを使用した実験的検証により、計算に依存しない進歩により、テストされたスケール全体で大幅なパフォーマンスが得られることが確認されます（たとえば、CEGが最大3.5 \ Times $を合わせます）。
対照的に、コンピューティング依存の進歩は、より小さな実験スケールでのパフォーマンスに有害でしたが、モデルサイズが増加するにつれて（ベースラインと同等）CEGの改善を示しました。
重要なことに、これらの調査結果は、LLMの進行が遅くなる可能性がある一方で、計算ハードウェアの制限がアルゴリズムの進歩によって駆動されるすべての能力の向上を防ぐには不十分であることを示しています。
したがって、効果的なAIの監視は、ハードウェアへの特異な焦点を超えて、アルゴリズム研究を理解し、予測し、潜在的に導くためのメカニズムを組み込まなければならないと主張します。
提案されたフレームワークは、AIの進捗を予測するための分析ツールとしても機能します。

要約(オリジナル)

Regulatory efforts to govern large language model (LLM) development have predominantly focused on restricting access to high-performance computational resources. This study evaluates the efficacy of such measures by examining whether LLM capabilities can advance through algorithmic innovation in compute-constrained environments. We propose a novel framework distinguishing compute-dependent innovations–which yield disproportionate benefits at high compute–from compute-independent innovations, which improve efficiency across compute scales. The impact is quantified using Compute-Equivalent Gain (CEG). Experimental validation with nanoGPT models confirms that compute-independent advancements yield significant performance gains (e.g., with combined CEG up to $3.5\times$) across the tested scales. In contrast, compute-dependent advancements were detrimental to performance at smaller experimental scales, but showed improved CEG (on par with the baseline) as model size increased, a trend consistent with their definition of yielding primary benefits at higher compute. Crucially, these findings indicate that restrictions on computational hardware, while potentially slowing LLM progress, are insufficient to prevent all capability gains driven by algorithmic advancements. We argue that effective AI oversight must therefore incorporate mechanisms for understanding, anticipating, and potentially guiding algorithmic research, moving beyond a singular focus on hardware. The proposed framework also serves as an analytical tool for forecasting AI progress.

arxiv情報

著者	Jack Sanderson,Teddy Foley,Spencer Guo,Anqi Qu,Henry Josephson
発行日	2025-06-05 17:09:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG, I.2 | コメントを受け付けていません

From Benign import Toxic: Jailbreaking the Language Model via Adversarial Metaphors

投稿日: 2025年6月6日作成者: jarxiv

要約

現在の研究は、脱獄攻撃によって有害なコンテンツを生成する大規模な言語モデル（LLM）のリスクを明らかにしています。
しかし、彼らは、有害なコンテンツの直接的な生成がゼロからの直接的な生成は、LLMに良性のコンテンツを有害な形に較正するよりも困難であることを見落としています。
私たちの研究では、敵対的なメタファー（Avatar）を悪用してLLMを誘導して、悪意のあるメタファーを促進するために誘導する新しい攻撃フレームワークを紹介します。
具体的には、有害なクエリに答えるために、アバターは、良性であるが論理的に関連するメタファーのセットを初期シードとして適応的に識別します。
次に、これらの比phorによって駆動されると、ターゲットLLMは比phor的な内容について推論および較正されるように誘導され、したがって、有害な反応を直接出力するか、比phor的および専門的な有害なコンテンツの間の残差を較正することによって侵害されます。
実験結果は、アバターがLLMSを効果的かつ移転可能な脱獄可能な脱獄が可能であり、複数の高度なLLMで最先端の攻撃成功率を達成できることを示しています。

要約(オリジナル)

Current studies have exposed the risk of Large Language Models (LLMs) generating harmful content by jailbreak attacks. However, they overlook that the direct generation of harmful content from scratch is more difficult than inducing LLM to calibrate benign content into harmful forms. In our study, we introduce a novel attack framework that exploits AdVersArial meTAphoR (AVATAR) to induce the LLM to calibrate malicious metaphors for jailbreaking. Specifically, to answer harmful queries, AVATAR adaptively identifies a set of benign but logically related metaphors as the initial seed. Then, driven by these metaphors, the target LLM is induced to reason and calibrate about the metaphorical content, thus jailbroken by either directly outputting harmful responses or calibrating residuals between metaphorical and professional harmful content. Experimental results demonstrate that AVATAR can effectively and transferable jailbreak LLMs and achieve a state-of-the-art attack success rate across multiple advanced LLMs.

arxiv情報

著者	Yu Yan,Sheng Sun,Zenghao Duan,Teli Liu,Min Liu,Zhiyi Yin,Jiangyu Lei,Qi Li
発行日	2025-06-05 17:10:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.CR | コメントを受け付けていません

Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning

投稿日: 2025年6月6日作成者: jarxiv

要約

大きな推論モデル（LRMS）は、推論時により多くのトークンを生成することにより、挑戦的な推論タスクでより高いパフォーマンスを実現しますが、この冗長性はしばしば簡単な問題について計算を無駄にします。
短いトレースでの監視された微調整、ユーザー制御予算、または均一なペナルティを備えたRLを含む既存のソリューションには、データキュレーション、手動構成、またはすべての問題を難易度に関係なく同様に扱う必要があります。
Adaptive Length Penalty（ALP）を導入します。これは、強化学習目標の合わせた生成長に合わせて解決速度を導入します。
トレーニング中、ALPはそれぞれのプロンプトのオンラインを複数のロールアウトを介して監視し、そのレートと反比例する大きさのスケールを追加する差別的なペナルティを追加します。
ALPを備えたポストトレーニングDeepScaler-1.5Bは、パフォーマンスを大幅に低下させることなく、平均トークンの使用量を50 \％削減します。
固定予算と均一なペナルティベースラインと比較して、ALPは、簡単なプロンプトで計算を削減し、保存されたトークンを困難なトークンに再割り当てすることにより、予算をよりインテリジェントに再配置し、より高いコストで最も難しい問題でより高い精度を提供します。

要約(オリジナル)

Large reasoning models (LRMs) achieve higher performance on challenging reasoning tasks by generating more tokens at inference time, but this verbosity often wastes computation on easy problems. Existing solutions, including supervised finetuning on shorter traces, user-controlled budgets, or RL with uniform penalties, either require data curation, manual configuration, or treat all problems alike regardless of difficulty. We introduce Adaptive Length Penalty (ALP), a reinforcement learning objective tailoring generation length to per-prompt solve rate. During training, ALP monitors each prompt’s online solve rate through multiple rollouts and adds a differentiable penalty whose magnitude scales inversely with that rate, so confident (easy) prompts incur a high cost for extra tokens while hard prompts remain unhindered. Posttraining DeepScaleR-1.5B with ALP cuts average token usage by 50\% without significantly dropping performance. Relative to fixed-budget and uniform penalty baselines, ALP redistributes its reduced budget more intelligently by cutting compute on easy prompts and reallocating saved tokens to difficult ones, delivering higher accuracy on the hardest problems with higher cost.

arxiv情報

著者	Violet Xiang,Chase Blagden,Rafael Rafailov,Nathan Lile,Sang Truong,Chelsea Finn,Nick Haber
発行日	2025-06-05 17:17:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

Teaming in the AI Era: AI-Augmented Frameworks for Forming, Simulating, and Optimizing Human Teams

投稿日: 2025年6月6日作成者: jarxiv

要約

効果的なチームワークは、多様なドメインで不可欠です。
チームの形成段階では、重要な課題は、チーム全体の満足度を高めるためにユーザーの好みをタスクの目標と効果的にバランスさせるチームを形成することです。
チームのパフォーマンスステージでは、チームのパフォーマンスを維持するために、結束とエンゲージメントを維持することが重要です。
ただし、チームの最適化のための既存の計算ツールとアルゴリズムは、静的データ入力、狭いアルゴリズム目標、または特定のコンテキストに合わせたソリューションに依存していることが多く、チームメンバーの個性の動的な相互作用、進化する目標、および個々の好みの変化を考慮しません。
したがって、純粋にアルゴリズムの割り当ては、チームのダイナミクスが進化するにつれてメンバーが行動と相互作用を調整するのに役立つタイムリーでパーソナライズされたガイダンスがないため、チームの目標へのメンバーのコミットメントやチームの目標へのコミットメントを減らすか、チームの目標に対するメンバーのコミットメントを減らすことができるため、チームはメンバーの不満に遭遇する可能性があります。
最終的に、これらの課題は、チーム全体のパフォーマンスの低下につながる可能性があります。
私の博士号
論文の目的は、チームの満足度、エンゲージメント、パフォーマンスを向上させるAI-Augmentedチームの最適化フレームワークと実用的なシステムを開発することを目的としています。
まず、マルチアームのバンディットアルゴリズムを活用して、ユーザーの好みに基づいてチームの構成を繰り返し洗練し、個々のニーズと集団チームの目標を確保してチームの満足度を高めることを保証するチームフォーメーションフレームワークを提案します。
第二に、大規模な言語モデル（LLM）を利用してチームと個々のメンバーの両方に即時のパーソナライズされたフィードバックを提供し、結束とエンゲージメントを強化するAI駆動型システムであるTAIFA（チームAIフィードバックアシスタント）を紹介します。
最後に、マルチエージェントチームをシミュレートするLLMベースのシミュレーションフレームワークであるPupereteerllmを、現実的な環境内で複雑なチームダイナミクスをモデル化し、タスク駆動型のコラボレーションと長期的な調整を組み込んでいます。

要約(オリジナル)

Effective teamwork is essential across diverse domains. During the team formation stage, a key challenge is forming teams that effectively balance user preferences with task objectives to enhance overall team satisfaction. In the team performing stage, maintaining cohesion and engagement is critical for sustaining high team performance. However, existing computational tools and algorithms for team optimization often rely on static data inputs, narrow algorithmic objectives, or solutions tailored for specific contexts, failing to account for the dynamic interplay of team members personalities, evolving goals, and changing individual preferences. Therefore, teams may encounter member dissatisfaction, as purely algorithmic assignments can reduce members commitment to team goals or experience suboptimal engagement due to the absence of timely, personalized guidance to help members adjust their behaviors and interactions as team dynamics evolve. Ultimately, these challenges can lead to reduced overall team performance. My Ph.D. dissertation aims to develop AI-augmented team optimization frameworks and practical systems that enhance team satisfaction, engagement, and performance. First, I propose a team formation framework that leverages a multi-armed bandit algorithm to iteratively refine team composition based on user preferences, ensuring alignment between individual needs and collective team goals to enhance team satisfaction. Second, I introduce tAIfa (Team AI Feedback Assistant), an AI-powered system that utilizes large language models (LLMs) to deliver immediate, personalized feedback to both teams and individual members, enhancing cohesion and engagement. Finally, I present PuppeteerLLM, an LLM-based simulation framework that simulates multi-agent teams to model complex team dynamics within realistic environments, incorporating task-driven collaboration and long-term coordination.

arxiv情報

著者	Mohammed Almutairi
発行日	2025-06-05 17:24:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.HC, cs.MA | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント