jarxiv | Japanese arxiv | ページ 929

S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models

投稿日: 2025年4月15日作成者: jarxiv

要約

S1-Benchを紹介します。S1-Benchは、大規模な推論モデルを評価するために設計された新しいベンチマークである「LRMS）パフォーマンスを紹介します。
LRMは、明示的な思考チェーンを通じて複雑な推論タスクで大きなブレークスルーを達成しましたが、深い分析的思考への依存は、システム1思考機能を制限する可能性があります。
さらに、そのような機能を必要とするタスクでのLRMSのパフォーマンスを評価するために、ベンチマークの欠如が現在存在しています。
このギャップを埋めるために、S1-Benchは、このようなタスクでのLRMSのパフォーマンスを評価するために特別に設計された複数のドメインと言語にわたって、単純で多様で自然に明確な質問のセットを提示します。
22 LRMの包括的な評価により、有意な効率が低いことが明らかになり、出力は従来の小型LLMの平均よりも15.5倍長くなります。
さらに、LRMはしばしば正解を早期に識別しますが、不必要な審議を続け、一部のモデルでは多数のエラーを生成します。
これらの調査結果は、現在のLRMの厳格な推論パターンを強調し、タスクの複雑さに適切に適応できるバランスの取れたデュアルシステム思考機能を達成するために必要な実質的な開発を強調しています。

要約(オリジナル)

We introduce S1-Bench, a novel benchmark designed to evaluate Large Reasoning Models’ (LRMs) performance on simple tasks that favor intuitive system 1 thinking rather than deliberative system 2 reasoning. While LRMs have achieved significant breakthroughs in complex reasoning tasks through explicit chains of thought, their reliance on deep analytical thinking may limit their system 1 thinking capabilities. Moreover, a lack of benchmark currently exists to evaluate LRMs’ performance in tasks that require such capabilities. To fill this gap, S1-Bench presents a set of simple, diverse, and naturally clear questions across multiple domains and languages, specifically designed to assess LRMs’ performance in such tasks. Our comprehensive evaluation of 22 LRMs reveals significant lower efficiency tendencies, with outputs averaging 15.5 times longer than those of traditional small LLMs. Additionally, LRMs often identify correct answers early but continue unnecessary deliberation, with some models even producing numerous errors. These findings highlight the rigid reasoning patterns of current LRMs and underscore the substantial development needed to achieve balanced dual-system thinking capabilities that can adapt appropriately to task complexity.

arxiv情報

著者	Wenyuan Zhang,Shuaiyi Nie,Xinghua Zhang,Zefeng Zhang,Tingwen Liu
発行日	2025-04-14 16:13:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

SymRTLO: Enhancing RTL Code Optimization with LLMs and Neuron-Inspired Symbolic Reasoning

投稿日: 2025年4月15日作成者: jarxiv

要約

登録転送レベル（RTL）コードの最適化は、合成の初期段階でデジタルサーキットのパワー、パフォーマンス、面積（PPA）を改善するために重要です。
合成フィードバックに導かれた手動書き込みは、高品質の結果をもたらす可能性がありますが、時間がかかり、エラーが発生しやすいです。
ほとんどの既存のコンパイラベースのアプローチは、複雑な設計制約を処理するのが困難です。
大規模な言語モデル（LLM）ベースの方法は、これらの課題に対処するための有望な代替手段として浮上しています。
ただし、LLMベースのアプローチは、生成されたコードと提供されたプロンプトとの間の整合を確保することに困難に直面することがよくあります。
このホワイトペーパーでは、LLMベースのコードをシンボリック推論技術とシームレスに統合する新しいニューロンシンボリックRTL最適化フレームワークであるSymrtloを紹介します。
当社の方法には、最適化ルールの検索 – 継続的な生成（RAG）システムと抽象的な構文ツリー（AST）ベースのテンプレートが組み込まれているため、不足している回路の動作を最小限に抑えながら構文の正しさを維持するLLMベースの書き換えが可能になります。
有限状態マシン（FSM）ロジックを分析および最適化するためのシンボリックモジュールが提案されており、パターンベースのコンパイラの範囲を超えて、細粒状態のマージと部分仕様の処理が可能になります。
さらに、正式な等価チェックとテスト駆動型検証を組み合わせた高速検証パイプラインにより、検証の複雑さがさらに低下します。
Synopsys Design CompilerおよびYosysを使用したRTL-Rewriterベンチマークの実験は、Symrtloが最大の方法と比較して、それぞれ最大43.9％、62.5％、および51.1％を改善することを示しています。

要約(オリジナル)

Optimizing Register Transfer Level (RTL) code is crucial for improving the power, performance, and area (PPA) of digital circuits in the early stages of synthesis. Manual rewriting, guided by synthesis feedback, can yield high-quality results but is time-consuming and error-prone. Most existing compiler-based approaches have difficulty handling complex design constraints. Large Language Model (LLM)-based methods have emerged as a promising alternative to address these challenges. However, LLM-based approaches often face difficulties in ensuring alignment between the generated code and the provided prompts. This paper presents SymRTLO, a novel neuron-symbolic RTL optimization framework that seamlessly integrates LLM-based code rewriting with symbolic reasoning techniques. Our method incorporates a retrieval-augmented generation (RAG) system of optimization rules and Abstract Syntax Tree (AST)-based templates, enabling LLM-based rewriting that maintains syntactic correctness while minimizing undesired circuit behaviors. A symbolic module is proposed for analyzing and optimizing finite state machine (FSM) logic, allowing fine-grained state merging and partial specification handling beyond the scope of pattern-based compilers. Furthermore, a fast verification pipeline, combining formal equivalence checks with test-driven validation, further reduces the complexity of verification. Experiments on the RTL-Rewriter benchmark with Synopsys Design Compiler and Yosys show that SymRTLO improves power, performance, and area (PPA) by up to 43.9%, 62.5%, and 51.1%, respectively, compared to the state-of-the-art methods.

arxiv情報

著者	Yiting Wang,Wanghao Ye,Ping Guo,Yexiao He,Ziyao Wang,Yexiao He,Bowei Tian,Shwai He,Guoheng Sun,Zheyu Shen,Sihan Chen,Ankur Srivastava,Qingfu Zhang,Gang Qu,Ang Li
発行日	2025-04-14 16:15:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.AR, cs.LG, cs.PL | コメントを受け付けていません

Explanation-Driven Interventions for Artificial Intelligence Model Customization: Empowering End-Users to Tailor Black-Box AI in Rhinocytology

投稿日: 2025年4月15日作成者: jarxiv

要約

現代社会における人工知能（AI）の統合は、個人がタスクをどのように実行するかを変えています。
リスクの高いドメインでは、AIシステムに対する人間の制御を確保することは、依然として重要な設計上の課題です。
この記事では、ブラックボックスAIモデルの新しいエンドユーザー開発（EUD）アプローチを紹介し、ユーザーが説明を編集し、ターゲットを絞った介入を通じて将来の予測に影響を与えることができます。
説明可能性、ユーザー制御、およびモデルの適応性を組み合わせることにより、提案された方法は、人間中心のAI（HCAI）を進め、人間と適応型のユーザーに調整されたAIシステムとの共生関係を促進します。

要約(オリジナル)

The integration of Artificial Intelligence (AI) in modern society is transforming how individuals perform tasks. In high-risk domains, ensuring human control over AI systems remains a key design challenge. This article presents a novel End-User Development (EUD) approach for black-box AI models, enabling users to edit explanations and influence future predictions through targeted interventions. By combining explainability, user control, and model adaptability, the proposed method advances Human-Centered AI (HCAI), promoting a symbiotic relationship between humans and adaptive, user-tailored AI systems.

arxiv情報

著者	Andrea Esposito,Miriana Calvano,Antonio Curci,Francesco Greco,Rosa Lanzilotti,Antonio Piccinno
発行日	2025-04-14 16:21:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.HC | コメントを受け付けていません

Towards Fairness for the Right Reasons: Using Saliency Maps to Evaluate Bias Removal in Neural Networks

投稿日: 2025年4月15日作成者: jarxiv

要約

機械学習システムの広範な採用は、公平性とバイアスに関する重要な懸念を提起し、AI開発に不可欠な有害なバイアスを軽減しています。
この論文では、コンピュータービジョンタスクに適用されるニューラルネットワークにおける公平性の改善と有害なバイアスの除去との関係を調査します。
まず、モデルの意思決定プロセスのシフトを評価するために顕著性マップを分析する新しいXaiベースのメトリックのセットを紹介します。
次に、成功したデビアティングメソッドが、保護された属性からモデルの焦点を体系的にリダイレクトすることを実証します。
さらに、もともとアーティファクトの除去のために開発された技術は、公平のために効果的に再利用できることを示しています。
これらの調査結果は、モデルが正しい理由で公平であることを保証することの重要性を強調し、より倫理的で信頼できるAIシステムの開発に貢献しています。

要約(オリジナル)

The widespread adoption of machine learning systems has raised critical concerns about fairness and bias, making mitigating harmful biases essential for AI development. In this paper, we investigate the relationship between fairness improvement and the removal of harmful biases in neural networks applied to computer vision tasks. First, we introduce a set of novel XAI-based metrics that analyze saliency maps to assess shifts in a model’s decision-making process. Then, we demonstrate that successful debiasing methods systematically redirect model focus away from protected attributes. Additionally, we show that techniques originally developed for artifact removal can be effectively repurposed for fairness. These findings underscore the importance of ensuring that models are fair for the right reasons, contributing to the development of more ethical and trustworthy AI systems.

arxiv情報

著者	Lukasz Sztukiewicz,Ignacy Stępka,Michał Wiliński,Jerzy Stefanowski
発行日	2025-04-14 16:34:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CY, cs.LG | コメントを受け付けていません

Hatred Stems from Ignorance! Distillation of the Persuasion Modes in Countering Conversational Hate Speech

投稿日: 2025年4月15日作成者: jarxiv

要約

counterspeechが使用する要因を調べることは、オンラインでヘイトスピーチに直面するための最適な方法を理解するための中核です。
さまざまな研究では、感情的な共感、攻撃性、敵意など、カウンタースピーチで使用される感情的な基本要因を評価しています。
会話で使用されている反論をよりよく理解するために、この研究は説得モードを理性、感情、信頼性に蒸留し、人種差別、性差別、宗教的偏見に関する閉鎖（マルチターン）とオープン（1回ターン）の2種類の会話の相互作用での使用を評価します。
この評価は、機械で生成されたカウンタースピーチとは対照的に、人間が供給されていることで見られる明確な動作をカバーしています。
また、取られたスタンスと、counterspeechで見られる説得のモードとの相互作用を評価します。
特に、特にトピックの観点から、オープンおよびクローズドインタラクションで使用されるcounterspeech説得モードの微妙な違いが、コメントを嫌うために対策を表現するために説得モードとして理由を使用する一般的な傾向があることを観察します。
マシンで生成されたcounterspeechは、感情的な説得モードを示す傾向がありますが、人間のカウンターは理性に傾いています。
さらに、私たちの研究は、理由が他の説得モードよりも支持的な返信を得る傾向があることを示しています。
調査結果は、説得モードをヘイトスピーチに対抗することに関する研究に説得モードを組み込む可能性を強調しています。これは、説明可能性の最適な手段として役立つ可能性があり、返信のスタンスのさらなる採用と、最適なカウンタースペーチを評価する際に果たす役割を採用する方法を強調しています。

要約(オリジナル)

Examining the factors that the counterspeech uses are at the core of understanding the optimal methods for confronting hate speech online. Various studies have assessed the emotional base factors used in counter speech, such as emotional empathy, offensiveness, and hostility. To better understand the counterspeech used in conversations, this study distills persuasion modes into reason, emotion, and credibility and evaluates their use in two types of conversation interactions: closed (multi-turn) and open (single-turn) concerning racism, sexism, and religious bigotry. The evaluation covers the distinct behaviors seen with human-sourced as opposed to machine-generated counterspeech. It also assesses the interplay between the stance taken and the mode of persuasion seen in the counterspeech. Notably, we observe nuanced differences in the counterspeech persuasion modes used in open and closed interactions, especially in terms of the topic, with a general tendency to use reason as a persuasion mode to express the counterpoint to hate comments. The machine-generated counterspeech tends to exhibit an emotional persuasion mode, while human counters lean toward reason. Furthermore, our study shows that reason tends to obtain more supportive replies than other persuasion modes. The findings highlight the potential for incorporating persuasion modes into studies about countering hate speech, as they can serve as an optimal means of explainability and pave the way for the further adoption of the reply’s stance and the role it plays in assessing what comprises the optimal counterspeech.

arxiv情報

著者	Ghadi Alyahya,Abeer Aldayel
発行日	2025-04-14 16:35:50+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

Teacher Motion Priors: Enhancing Robot Locomotion over Challenging Terrain

投稿日: 2025年4月15日作成者: jarxiv

要約

複雑な地形での堅牢な運動を達成することは、高次元の制御と環境不確実性のために依然として課題です。
このペーパーでは、教師の学生のパラダイムに基づいた教師の事前の枠組みを紹介し、学習効率と一般化を改善するために模倣と補助タスク学習を統合します。
エンコーダーベースの状態埋め込みに強く依存している従来のパラダイムとは異なり、フレームワークはネットワーク設計を切り離し、ポリシーネットワークと展開を簡素化します。
高性能の教師ポリシーは、特権情報を使用して一般化可能なモーションスキルを獲得するために最初にトレーニングされます。
教師の動き分布は、生成的な敵対的なメカニズムを介して、騒々しい固有受容データにのみ依存して、分布シフトによって引き起こされるパフォーマンスの劣化を緩和する学生ポリシーに転送されます。
さらに、補助タスク学習により、学生ポリシーの機能表現が強化され、収束が高速化され、さまざまな地形への適応性が向上します。
このフレームワークはヒューマノイドロボットで検証されており、動的な地形の移動安定性の大幅な改善と開発コストの大幅な削減を示しています。
この作業は、ヒューマノイドロボットに堅牢な移動戦略を展開するための実用的なソリューションを提供します。

要約(オリジナル)

Achieving robust locomotion on complex terrains remains a challenge due to high dimensional control and environmental uncertainties. This paper introduces a teacher prior framework based on the teacher student paradigm, integrating imitation and auxiliary task learning to improve learning efficiency and generalization. Unlike traditional paradigms that strongly rely on encoder-based state embeddings, our framework decouples the network design, simplifying the policy network and deployment. A high performance teacher policy is first trained using privileged information to acquire generalizable motion skills. The teacher’s motion distribution is transferred to the student policy, which relies only on noisy proprioceptive data, via a generative adversarial mechanism to mitigate performance degradation caused by distributional shifts. Additionally, auxiliary task learning enhances the student policy’s feature representation, speeding up convergence and improving adaptability to varying terrains. The framework is validated on a humanoid robot, showing a great improvement in locomotion stability on dynamic terrains and significant reductions in development costs. This work provides a practical solution for deploying robust locomotion strategies in humanoid robots.

arxiv情報

著者	Fangcheng Jin,Yuqi Wang,Peixin Ma,Guodong Yang,Pan Zhao,En Li,Zhengtao Zhang
発行日	2025-04-14 16:36:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: 68T40, cs.AI, cs.RO | コメントを受け付けていません

Can LLMs Assist Expert Elicitation for Probabilistic Causal Modeling?

投稿日: 2025年4月15日作成者: jarxiv

要約

目的：この研究では、構造化された因果知識を抽出し、生体認証およびヘルスケアアプリケーションの因果モデリングを促進するための人間の専門家の誘発に代わるものとして、大規模な言語モデル（LLM）の可能性を調査しています。
材料と方法：LLM生成された因果構造、特にベイジアンネットワーク（BNS）は、ヘルスケアデータセットを使用した従来の統計的方法（ベイジアン情報基準など）に対してベンチマークされました。
検証手法には、関係を検証するための構造方程式モデリング（SEM）、およびエントロピー、予測精度、ネットワーク構造を比較するための堅牢性などの尺度が含まれていました。
結果と議論：LLM生成BNSは、専門家の誘導および統計的に生成されたBNよりもエントロピーが少ないことを示し、予測のより高い信頼性と精度を示唆しています。
ただし、コンテキストの制約、幻覚依存関係、トレーニングデータから継承された潜在的なバイアスなどの制限には、さらなる調査が必要です。
結論：LLMは、確率論的因果モデリングの専門家の誘発における新しいフロンティアを表し、そのようなモデルを使用して意思決定の透明性を向上させ、不確実性を軽減することを約束します。

要約(オリジナル)

Objective: This study investigates the potential of Large Language Models (LLMs) as an alternative to human expert elicitation for extracting structured causal knowledge and facilitating causal modeling in biometric and healthcare applications. Material and Methods: LLM-generated causal structures, specifically Bayesian networks (BNs), were benchmarked against traditional statistical methods (e.g., Bayesian Information Criterion) using healthcare datasets. Validation techniques included structural equation modeling (SEM) to verifying relationships, and measures such as entropy, predictive accuracy, and robustness to compare network structures. Results and Discussion: LLM-generated BNs demonstrated lower entropy than expert-elicited and statistically generated BNs, suggesting higher confidence and precision in predictions. However, limitations such as contextual constraints, hallucinated dependencies, and potential biases inherited from training data require further investigation. Conclusion: LLMs represent a novel frontier in expert elicitation for probabilistic causal modeling, promising to improve transparency and reduce uncertainty in the decision-making using such models.

arxiv情報

著者	Olha Shaposhnyk,Daria Zahorska,Svetlana Yanushkevich
発行日	2025-04-14 16:45:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

Performance of Large Language Models in Supporting Medical Diagnosis and Treatment

投稿日: 2025年4月15日作成者: jarxiv

要約

大規模な言語モデル（LLMS）をヘルスケアに統合すると、診断の精度を高め、医療計画をサポートする大きな可能性があります。
これらのAI駆動型システムは、膨大なデータセットを分析し、臨床医が病気を特定し、治療を推奨し、患者の転帰を予測するのを支援します。
この研究では、標準化された医療知識評価である2024年のポルトガル国立医療アクセス（PNA）の2024年のポルトガル国家試験（PNA）で、オープンソースモデルと閉鎖モデルの両方を含む、さまざまな現代LLMのパフォーマンスを評価します。
私たちの結果は、精度と費用対効果のかなりのばらつきを強調しており、いくつかのモデルがこの特定のタスクで医学生の人間のベンチマークを超えるパフォーマンスを示しています。
正確性とコストの組み合わせスコアに基づいて主要なモデルを特定し、考え方のような推論方法論の意味を議論し、LLMが複雑な臨床的意思決定における医療専門家を支援する貴重な補完ツールとして機能する可能性を強調します。

要約(オリジナル)

The integration of Large Language Models (LLMs) into healthcare holds significant potential to enhance diagnostic accuracy and support medical treatment planning. These AI-driven systems can analyze vast datasets, assisting clinicians in identifying diseases, recommending treatments, and predicting patient outcomes. This study evaluates the performance of a range of contemporary LLMs, including both open-source and closed-source models, on the 2024 Portuguese National Exam for medical specialty access (PNA), a standardized medical knowledge assessment. Our results highlight considerable variation in accuracy and cost-effectiveness, with several models demonstrating performance exceeding human benchmarks for medical students on this specific task. We identify leading models based on a combined score of accuracy and cost, discuss the implications of reasoning methodologies like Chain-of-Thought, and underscore the potential for LLMs to function as valuable complementary tools aiding medical professionals in complex clinical decision-making.

arxiv情報

著者	Diogo Sousa,Guilherme Barbosa,Catarina Rocha,Dulce Oliveira
発行日	2025-04-14 16:53:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.ET, cs.HC, I.2.7 | コメントを受け付けていません

COMPASS: Computational Mapping of Patient-Therapist Alliance Strategies with Language Modeling

投稿日: 2025年4月15日作成者: jarxiv

要約

Therapeutic Working Allianceは、心理療法の成功の重要な予測因子です。
伝統的に、ワーキングアライアンスの評価は、セラピストと患者の両方が完了したアンケートに依存しています。
この論文では、心理療法セッションで使用されている自然言語からの治療的ワーキングアライアンスを直接推測するための新しいフレームワークであるコンパスを紹介します。
私たちのアプローチは、高度な大規模な言語モデル（LLM）を活用して、セッションの成績証明書を分析し、分散表現にマッピングします。
これらの表現は、ダイアログとワーキングアライアンスインベントリなどの心理測定手段との間の意味的な類似点を捉えています。
不安（n = 498）、うつ病（n = 377）、統合失調症（n = 71）、および自殺傾向（n = 12）を含む多様な精神医学的条件にまたがる950を超えるセッションのデータセットを1970年から2012年の間に収集し、患者界面の環境を提供する方法の有効性の有効性を実証します。
臨床診療の洞察、および治療されている状態に関連する新たなパターンの特定。
生成言語モデルのプロンプトと組み合わせて、さまざまな深い学習ベースのトピックモデリング手法を採用することにより、さまざまな精神疾患の局所特性と、会話の各ターン中にこれらのトピックがどのように進化するかを分析します。
この統合フレームワークは、治療的相互作用の理解を高め、治療関係の質に関するセラピストのタイムリーなフィードバックを可能にし、心理療法の有効性を改善するための明確で実用的な洞察を提供します。

要約(オリジナル)

The therapeutic working alliance is a critical predictor of psychotherapy success. Traditionally, working alliance assessment relies on questionnaires completed by both therapists and patients. In this paper, we present COMPASS, a novel framework to directly infer the therapeutic working alliance from the natural language used in psychotherapy sessions. Our approach leverages advanced large language models (LLMs) to analyze session transcripts and map them to distributed representations. These representations capture the semantic similarities between the dialogues and psychometric instruments, such as the Working Alliance Inventory. Analyzing a dataset of over 950 sessions spanning diverse psychiatric conditions — including anxiety (N=498), depression (N=377), schizophrenia (N=71), and suicidal tendencies (N=12) — collected between 1970 and 2012, we demonstrate the effectiveness of our method in providing fine-grained mapping of patient-therapist alignment trajectories, offering interpretable insights for clinical practice, and identifying emerging patterns related to the condition being treated. By employing various deep learning-based topic modeling techniques in combination with prompting generative language models, we analyze the topical characteristics of different psychiatric conditions and how these topics evolve during each turn of the conversation. This integrated framework enhances the understanding of therapeutic interactions, enables timely feedback for therapists on the quality of therapeutic relationships, and provides clear, actionable insights to improve the effectiveness of psychotherapy.

arxiv情報

著者	Baihan Lin,Djallel Bouneffouf,Yulia Landa,Rachel Jespersen,Cheryl Corcoran,Guillermo Cecchi
発行日	2025-04-14 16:58:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.HC, cs.LG, q-bio.NC | コメントを受け付けていません

Towards Safer Chatbots: A Framework for Policy Compliance Evaluation of Custom GPTs

投稿日: 2025年4月15日作成者: jarxiv

要約

大規模な言語モデル（LLM）は、前例のない卓越性を獲得し、多様なドメイン全体で広範な採用を達成し、社会に深く統合しています。
特定のタスク用の生成事前訓練トランス（GPT）などの汎用LLMを微調整する機能により、多数のカスタムGPTの出現が促進されました。
これらのテーラードモデルは、OpenaiのGPTストアなど、専用のマーケットプレイスを通じてますます利用できるようになりました。
ただし、それらのブラックボックスの性質は、重大な安全性とコンプライアンスリスクをもたらします。
この作業では、これらのシステムの許容される動作を定義するOpenAIの使用ポリシーに対するカスタムGPTの自動評価のためのスケーラブルなフレームワークを提示します。
当社のフレームワークには、3つのコアコンポーネントが統合されています。（1）GPTストアのモデルの自動発見とデータ収集、（2）特定のポリシーカテゴリに合わせたレッドチームのプロンプトジェネレーター、および各ターゲットGPTの特性、および（3）潜在的なポリシー違反のために各プロンプト応答ペアを分析するLLM-A-A-Judge手法。
手動で注釈されたグラウンドトゥルースでフレームワークを検証し、ロマンチック、サイバーセキュリティ、アカデミックGPTの3つのカテゴリにわたって782のカスタムGPTを使用した大規模な研究を通じて評価します。
当社の手動注釈プロセスは、政策違反の特定で0.975のF1スコアを達成し、フレームワークの評価の信頼性を確認しました。
結果は、分析されたモデルの58.7％が非コンプライアンスの兆候を示し、GPTストアのレビューと承認プロセスの弱点を暴露することを明らかにしています。
さらに、我々の調査結果は、モデルの人気がコンプライアンスと相関していないことを示しており、非コンプライアンスの問題は、ユーザー駆動型のカスタマイズではなく、ベースモデルから継承された行動に大きく起因することを示しています。
このアプローチは、他のチャットボットプラットフォームやポリシードメインに拡張可能であり、LLMベースのシステムの安全性を向上させると考えています。

要約(オリジナル)

Large Language Models (LLMs) have gained unprecedented prominence, achieving widespread adoption across diverse domains and integrating deeply into society. The capability to fine-tune general-purpose LLMs, such as Generative Pre-trained Transformers (GPT), for specific tasks has facilitated the emergence of numerous Custom GPTs. These tailored models are increasingly made available through dedicated marketplaces, such as OpenAI’s GPT Store. However, their black-box nature introduces significant safety and compliance risks. In this work, we present a scalable framework for the automated evaluation of Custom GPTs against OpenAI’s usage policies, which define the permissible behaviors of these systems. Our framework integrates three core components: (1) automated discovery and data collection of models from the GPT store, (2) a red-teaming prompt generator tailored to specific policy categories and the characteristics of each target GPT, and (3) an LLM-as-a-judge technique to analyze each prompt-response pair for potential policy violations. We validate our framework with a manually annotated ground truth, and evaluate it through a large-scale study with 782 Custom GPTs across three categories: Romantic, Cybersecurity, and Academic GPTs. Our manual annotation process achieved an F1 score of 0.975 in identifying policy violations, confirming the reliability of the framework’s assessments. The results reveal that 58.7% of the analyzed models exhibit indications of non-compliance, exposing weaknesses in the GPT store’s review and approval processes. Furthermore, our findings indicate that a model’s popularity does not correlate with compliance, and non-compliance issues largely stem from behaviors inherited from base models rather than user-driven customizations. We believe this approach is extendable to other chatbot platforms and policy domains, improving LLM-based systems safety.

arxiv情報

著者	David Rodriguez,William Seymour,Jose M. Del Alamo,Jose Such
発行日	2025-04-14 16:58:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, I.2.1 | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント