Repurposing the scientific literature with vision-language models

要約

主要なビジョン言語モデル（VLM）は、一般的なインターネットコンテンツについてトレーニングされており、科学雑誌の豊かなドメイン固有の知識を見落としています。
専門分野の文献に関するトレーニングは、高性能のタスク固有のツールを生み出し、生成的AIが専門出版、教育、および臨床タスクのジェネラリストモデルと一致する可能性があります。
Neuropubsを作成しました。これは、23,000のNeurosurgery Publicationsの記事（134mの単語、78kの画像キャプションペア）のマルチモーダルデータセットを作成しました。
NeuroPubsを使用して、VLMSは出版対象のグラフィカルな要約（100の要約の70％）と、人間が書いたものと区別できないボードスタイルの質問（89,587の質問の54％）を生成しました。
これらの質問を使用して、34B-Parameter VLMであるCNS-Obsidianを訓練しました。
盲検化されたランダム化比較試験では、我々のモデルは、神経外科的鑑別診断における当時の最先端のGPT-4O（臨床的有用性、40.62％のUpvotes対57.89％、P = 0.1150;精度、59.38％対65.79％、P = 0.3797）の非劣性を示しました。
私たちのパイロット研究では、特殊なジャーナルコンテンツのトレーニング生成AIモデル – 大規模なインターネットデータなしでは、高性能のアカデミックおよび臨床ツールをもたらし、多様な分野でドメインに誘導されたAIを可能にします。

要約(オリジナル)

Leading vision-language models (VLMs) are trained on general Internet content, overlooking scientific journals’ rich, domain-specific knowledge. Training on specialty-specific literature could yield high-performance, task-specific tools, enabling generative AI to match generalist models in specialty publishing, educational, and clinical tasks. We created NeuroPubs, a multimodal dataset of 23,000 Neurosurgery Publications articles (134M words, 78K image-caption pairs). Using NeuroPubs, VLMs generated publication-ready graphical abstracts (70% of 100 abstracts) and board-style questions indistinguishable from human-written ones (54% of 89,587 questions). We used these questions to train CNS-Obsidian, a 34B-parameter VLM. In a blinded, randomized controlled trial, our model demonstrated non-inferiority to then state-of-the-art GPT-4o in neurosurgical differential diagnosis (clinical utility, 40.62% upvotes vs. 57.89%, p=0.1150; accuracy, 59.38% vs. 65.79%, p=0.3797). Our pilot study demonstrates how training generative AI models on specialty-specific journal content – without large-scale internet data – results in high-performance academic and clinical tools, enabling domain-tailored AI across diverse fields.

arxiv情報

著者	Anton Alyakin,Jaden Stryker,Daniel Alexander Alber,Karl L. Sangwon,Jin Vivian Lee,Brandon Duderstadt,Akshay Save,David Kurland,Spencer Frome,Shrutika Singh,Jeff Zhang,Eunice Yang,Ki Yun Park,Cordelia Orillac,Aly A. Valliani,Sean Neifert,Albert Liu,Aneek Patel,Christopher Livia,Darryl Lau,Ilya Laufer,Peter A. Rozman,Eveline Teresa Hidalgo,Howard Riina,Rui Feng,Todd Hollon,Yindalon Aphinyanaphongs,John G. Golfinos,Laura Snyder,Eric Leuthardt,Douglas Kondziolka,Eric Karl Oermann
発行日	2025-04-25 13:29:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Repurposing the scientific literature with vision-language models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー