Data Augmentation Scheme for Raman Spectra with Highly Correlated Annotations

要約

バイオテクノロジーでは、ラマン分光法は、細胞密度、基質および生成物の濃度を測定するプロセス分析技術 (PAT) として急速に人気が高まっています。
分子の振動モードを記録するため、その情報を単一スペクトルで非侵襲的に提供します。
通常、部分最小二乗法 (PLS) は、スペクトルから対象の変数に関する情報を推測するために選択されるモデルです。
ただし、生物学的プロセスはその複雑さで知られており、畳み込みニューラルネットワーク (CNN) が強力な代替手段となります。
非ガウスノイズを処理し、ビームの位置ずれ、ピクセルの誤動作、または追加の物質の存在を考慮に入れることができます。
ただし、モデルのトレーニング中に大量のデータが必要となり、プロセス変数の非線形依存関係が検出されます。
この研究では、スペクトルの相加的性質を利用して、統計的に独立したラベルを持つ特定のデータセットから追加のデータポイントを生成し、そのようなデータでトレーニングされたネットワークがモデル予測間の低い相関を示すようにします。
これらの生成されたデータポイントで CNN をトレーニングすると、アノテーションがモデルのトレーニングに使用されたデータセットと同じ相関関係を持たないデータセットでのパフォーマンスが向上することを示します。
このデータ拡張手法により、異なる相関関係を示す新しいコンテキストのトレーニングデータとしてスペクトルを再利用できるようになります。
追加のデータにより、より優れた、より堅牢なモデルを構築できます。
これは、大量の履歴データが利用可能であるものの、現在モデルのトレーニングには使用されていないシナリオで重要です。
ラルストニアユートロファのバッチ培養の合成スペクトルを使用して、実験中に基質、バイオマス、およびポリヒドロキシアルカノエート (PHA) バイオポリマー濃度を監視する、提案された方法の機能を実証します。

要約(オリジナル)

In biotechnology Raman Spectroscopy is rapidly gaining popularity as a process analytical technology (PAT) that measures cell densities, substrate- and product concentrations. As it records vibrational modes of molecules it provides that information non-invasively in a single spectrum. Typically, partial least squares (PLS) is the model of choice to infer information about variables of interest from the spectra. However, biological processes are known for their complexity where convolutional neural networks (CNN) present a powerful alternative. They can handle non-Gaussian noise and account for beam misalignment, pixel malfunctions or the presence of additional substances. However, they require a lot of data during model training, and they pick up non-linear dependencies in the process variables. In this work, we exploit the additive nature of spectra in order to generate additional data points from a given dataset that have statistically independent labels so that a network trained on such data exhibits low correlations between the model predictions. We show that training a CNN on these generated data points improves the performance on datasets where the annotations do not bear the same correlation as the dataset that was used for model training. This data augmentation technique enables us to reuse spectra as training data for new contexts that exhibit different correlations. The additional data allows for building a better and more robust model. This is of interest in scenarios where large amounts of historical data are available but are currently not used for model training. We demonstrate the capabilities of the proposed method using synthetic spectra of Ralstonia eutropha batch cultivations to monitor substrate, biomass and polyhydroxyalkanoate (PHA) biopolymer concentrations during of the experiments.

arxiv情報

著者	Christoph Lange,Isabel Thiele,Lara Santolin,Sebastian L. Riedel,Maxim Borisyak,Peter Neubauer,M. Nicolas Cruz Bournazou
発行日	2024-02-01 18:46:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Data Augmentation Scheme for Raman Spectra with Highly Correlated Annotations

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー