Towards a Benchmark for Scientific Understanding in Humans and Machines

要約

タイトル：人工知能や人間における科学的理解の評価基準に向けて

要約：
– 科学的理解は、世界を説明するための根本的な目標である。
– 人工知能や人間などのエージェントの科学的理解を測定する良い方法が現在存在していない。
– 明確な基準がないため、異なるレベルやアプローチの科学的理解を評価および比較することは困難である。
– このロードマップでは、科学哲学のツールを利用して、科学的理解の評価基準の枠組みを提案する。
– 私たちは、本当の理解は特定のタスクを遂行する能力として認識されるという行動的な概念を採用する。
– 私たちは、情報の検索、説明を作成するための情報の整理能力、異なる状況下でどのように変化するかを推論する能力を測定する一連の質問を考慮することによって、この概念を拡張する。
– 科学的理解の基準である科学的理解ベンチマーク（SUB）は、これらのテストのセットで形成され、異なるアプローチの評価と比較を可能にする。
– ベンチマークは、信頼を確立し、品質管理を確保し、パフォーマンス評価の基礎を提供する上で重要な役割を果たす。
– 機械と人間の科学的理解を調整することにより、彼らの有用性を向上させ、最終的には科学的理解を促進し、機械内で新しい洞察を発見することができるようになる。

要約(オリジナル)

Scientific understanding is a fundamental goal of science, allowing us to explain the world. There is currently no good way to measure the scientific understanding of agents, whether these be humans or Artificial Intelligence systems. Without a clear benchmark, it is challenging to evaluate and compare different levels of and approaches to scientific understanding. In this Roadmap, we propose a framework to create a benchmark for scientific understanding, utilizing tools from philosophy of science. We adopt a behavioral notion according to which genuine understanding should be recognized as an ability to perform certain tasks. We extend this notion by considering a set of questions that can gauge different levels of scientific understanding, covering information retrieval, the capability to arrange information to produce an explanation, and the ability to infer how things would be different under different circumstances. The Scientific Understanding Benchmark (SUB), which is formed by a set of these tests, allows for the evaluation and comparison of different approaches. Benchmarking plays a crucial role in establishing trust, ensuring quality control, and providing a basis for performance evaluation. By aligning machine and human scientific understanding we can improve their utility, ultimately advancing scientific understanding and helping to discover new insights within machines.

arxiv情報

著者	Kristian Gonzalez Barman,Sascha Caron,Tom Claassen,Henk de Regt
発行日	2023-04-21 08:57:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Towards a Benchmark for Scientific Understanding in Humans and Machines

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー