Efficient Biological Data Acquisition through Inference Set Design

要約

創薬では、効果的な薬剤を探すために、高度に自動化されたハイスループットの研究室が多数の化合物をスクリーニングするために使用されます。
これらの実験は高価であるため、化合物のサブセットを実験し、残りの実験の結果を予測することでコストを削減したいと考える人もいるでしょう。
この作業では、このシナリオを逐次サブセット選択問題としてモデル化します。システム全体として望ましいレベルの精度を達成するために、最小の候補セットを選択することを目指します。
私たちの重要な観察は、入力空間全体で予測問題の難易度に不均一性がある場合、取得プール内の最も難しい例のラベルを選択的に取得すると、比較的簡単な例だけが推論セットに残ることになり、
システム全体のパフォーマンスが向上します。
私たちはこのメカニズムを推論セット設計と呼び、信頼に基づくアクティブラーニングソリューションを使用して、これらの困難な例を排除することを提案します。
私たちのアルゴリズムには、システムが目標パフォーマンスに到達したことが十分に確信できる場合に実験の実行を停止する明示的な停止基準が含まれています。
画像データセットと分子データセット、および現実世界の大規模生物学的アッセイに関する実証研究は、推論セット設計のためのアクティブラーニングが、高いシステムパフォーマンスを維持しながら実験コストの大幅な削減につながることを示しています。

要約(オリジナル)

In drug discovery, highly automated high-throughput laboratories are used to screen a large number of compounds in search of effective drugs. These experiments are expensive, so one might hope to reduce their cost by experimenting on a subset of the compounds, and predicting the outcomes of the remaining experiments. In this work, we model this scenario as a sequential subset selection problem: we aim to select the smallest set of candidates in order to achieve some desired level of accuracy for the system as a whole. Our key observation is that, if there is heterogeneity in the difficulty of the prediction problem across the input space, selectively obtaining the labels for the hardest examples in the acquisition pool will leave only the relatively easy examples to remain in the inference set, leading to better overall system performance. We call this mechanism inference set design, and propose the use of a confidence-based active learning solution to prune out these challenging examples. Our algorithm includes an explicit stopping criterion that stops running the experiments when it is sufficiently confident that the system has reached the target performance. Our empirical studies on image and molecular datasets, as well as a real-world large-scale biological assay, show that active learning for inference set design leads to significant reduction in experimental cost while retaining high system performance.

arxiv情報

著者	Ihor Neporozhnii,Julien Roy,Emmanuel Bengio,Jason Hartford
発行日	2024-11-25 17:51:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Efficient Biological Data Acquisition through Inference Set Design

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー