Contextualizing biological perturbation experiments through language

要約

高度な摂動実験により、科学者は前例のない解像度で生体分子システムを調査することができますが、実験および分析コストは、広範な採用に大きな障壁をもたらします。
機械学習には、摂動スペースの効率的な調査を導き、これらのデータから新しい洞察を抽出する可能性があります。
しかし、現在のアプローチは、関連する生物学の意味的な豊かさを無視しており、その目的は下流の生物学的分析と誤って整理されています。
この論文では、大規模な言語モデル（LLM）が複雑な生物学的関係を表現し、実験結果を合理化するための天然媒体を提示すると仮定します。
Perturbqaを提案します。これは、摂動実験を介した構造化された推論のベンチマークです。
主に既存の知識を尋問する現在のベンチマークとは異なり、PerturbQAは、摂動モデリングのオープンな問題に触発されています。目に見えない摂動のための微分表現の予測と方向の変化、および遺伝子セット濃縮。
摂動をモデル化するための最先端の機械学習と統計的アプローチ、および標準的なLLM推論戦略を評価します。現在の方法は、PerturbQAではパフォーマンスが低いことがわかります。
実現可能性の証明として、夏を紹介します（現在の最先端に一致またはそれを超えるシンプルなドメインに基づいたLLMフレームワークを要約、取得、および回答します。コードとデータはhttps://github.com/genentech/perturbqaで公開されています。

要約(オリジナル)

High-content perturbation experiments allow scientists to probe biomolecular systems at unprecedented resolution, but experimental and analysis costs pose significant barriers to widespread adoption. Machine learning has the potential to guide efficient exploration of the perturbation space and extract novel insights from these data. However, current approaches neglect the semantic richness of the relevant biology, and their objectives are misaligned with downstream biological analyses. In this paper, we hypothesize that large language models (LLMs) present a natural medium for representing complex biological relationships and rationalizing experimental outcomes. We propose PerturbQA, a benchmark for structured reasoning over perturbation experiments. Unlike current benchmarks that primarily interrogate existing knowledge, PerturbQA is inspired by open problems in perturbation modeling: prediction of differential expression and change of direction for unseen perturbations, and gene set enrichment. We evaluate state-of-the-art machine learning and statistical approaches for modeling perturbations, as well as standard LLM reasoning strategies, and we find that current methods perform poorly on PerturbQA. As a proof of feasibility, we introduce Summer (SUMMarize, retrievE, and answeR, a simple, domain-informed LLM framework that matches or exceeds the current state-of-the-art. Our code and data are publicly available at https://github.com/genentech/PerturbQA.

arxiv情報

著者	Menghua Wu,Russell Littman,Jacob Levine,Lin Qiu,Tommaso Biancalani,David Richmond,Jan-Christian Huetter
発行日	2025-02-28 18:15:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Contextualizing biological perturbation experiments through language

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー