LMRPA: Large Language Model-Driven Efficient Robotic Process Automation for OCR

要約

このペーパーでは、光学式文字認識 (OCR) タスクの効率と速度を大幅に向上させるために設計された新しい大規模モデル駆動型ロボットプロセスオートメーション (RPA) モデルである LMRPA について紹介します。
従来の RPA プラットフォームは、OCR などの大量の反復プロセスを処理するときにパフォーマンスのボトルネックに悩まされることが多く、プロセスの効率が低下し、時間がかかります。
LMRPA を使用すると、大規模言語モデル (LLM) を統合して、抽出されたテキストの精度と可読性を向上させ、あいまいな文字や複雑なテキスト構造によってもたらされる課題を克服できます。LMRPA と、UiPath や Automation Anywhere などの主要な RPA プラットフォームを比較する広範なベンチマークが実施されました。
Tesseract や DocTR などの OCR エンジンを使用します。
その結果、LMRPA は優れたパフォーマンスを実現し、処理時間を最大 52\% 短縮します。
たとえば、Tesseract OCR タスクのバッチ 2 では、LMRPA は 9.8 秒でプロセスを完了しましたが、UiPath は 18.1 秒で終了し、Automation Anywhere は 18.7 秒で終了しました。
DocTR でも同様の改善が見られ、競合他社が同じ作業を行うのに 20 秒以上かかったのに対し、LMRPA はタスクを 12.7 秒で完了することで、同じプロセスを実行する他の自動化ツールを上回りました。
これらの調査結果は、LMRPA が OCR 主導の自動化プロセスに革命をもたらし、既存の最先端の RPA モデルに対するより効率的かつ効果的な代替ソリューションを提供する可能性を強調しています。

要約(オリジナル)

This paper introduces LMRPA, a novel Large Model-Driven Robotic Process Automation (RPA) model designed to greatly improve the efficiency and speed of Optical Character Recognition (OCR) tasks. Traditional RPA platforms often suffer from performance bottlenecks when handling high-volume repetitive processes like OCR, leading to a less efficient and more time-consuming process. LMRPA allows the integration of Large Language Models (LLMs) to improve the accuracy and readability of extracted text, overcoming the challenges posed by ambiguous characters and complex text structures.Extensive benchmarks were conducted comparing LMRPA to leading RPA platforms, including UiPath and Automation Anywhere, using OCR engines like Tesseract and DocTR. The results are that LMRPA achieves superior performance, cutting the processing times by up to 52\%. For instance, in Batch 2 of the Tesseract OCR task, LMRPA completed the process in 9.8 seconds, where UiPath finished in 18.1 seconds and Automation Anywhere finished in 18.7 seconds. Similar improvements were observed with DocTR, where LMRPA outperformed other automation tools conducting the same process by completing tasks in 12.7 seconds, while competitors took over 20 seconds to do the same. These findings highlight the potential of LMRPA to revolutionize OCR-driven automation processes, offering a more efficient and effective alternative solution to the existing state-of-the-art RPA models.

arxiv情報

著者	Osama Hosam Abdellaif,Abdelrahman Nader,Ali Hamdi
発行日	2024-12-24 00:21:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LMRPA: Large Language Model-Driven Efficient Robotic Process Automation for OCR

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー