jarxiv | Japanese arxiv | ページ 793

Enhancing Sentiment Analysis in Bengali Texts: A Hybrid Approach Using Lexicon-Based Algorithm and Pretrained Language Model Bangla-BERT

投稿日: 2025年4月24日作成者: jarxiv

要約

センチメント分析（SA）は、特定のテキスト内の感情的なトーンまたは極性を識別するプロセスであり、ユーザーの複雑な感情と内なる感情を明らかにすることを目指しています。
センチメント分析は、英語のような言語で広く研究されていますが、ベンガル語での研究は、特に細かい感情分類のために限られたままです。
この作業の目的は、ルールベースのアルゴリズムを事前に訓練された言語モデルと統合する新しいアプローチを開発することにより、このギャップを接続することを目的としています。
15,000を超える手動でラベル付けされたレビューを含むデータセットをゼロから開発しました。
次に、レキシコンデータ辞書を構築し、レビューに極性スコアを割り当てました。
センチメントスコアを生成し、レビューを9つの異なるセンチメントカテゴリに分類できるアプローチであるBangla Sentiment Polarity Score（BSPS）の新しいルールベースのアルゴリズムを開発しました。
この方法のパフォーマンスを評価するために、事前に訓練されたトランスベースの言語モデルであるBanglabertを使用して、分類された感情を評価しました。
また、元のデータでBanglabertと直接感情分類を実行し、このモデルの結果を評価しました。
私たちの分析により、BSPS + Banglabertハイブリッドアプローチがスタンドアロンバングラバートモデルを上回り、9つのセンチメントカテゴリ全体でより高い精度、精度、および微妙な分類を達成することが明らかになりました。
私たちの研究の結果は、ベンガル語での感情分析を強化するためのルールベースと事前に訓練された言語モデルアプローチを組み合わせることの価値と有効性を強調し、同様の言語複雑さを持つ言語での将来の研究と応用の経路を示唆しています。

要約(オリジナル)

Sentiment analysis (SA) is a process of identifying the emotional tone or polarity within a given text and aims to uncover the user’s complex emotions and inner feelings. While sentiment analysis has been extensively studied for languages like English, research in Bengali, remains limited, particularly for fine-grained sentiment categorization. This work aims to connect this gap by developing a novel approach that integrates rule-based algorithms with pre-trained language models. We developed a dataset from scratch, comprising over 15,000 manually labeled reviews. Next, we constructed a Lexicon Data Dictionary, assigning polarity scores to the reviews. We developed a novel rule based algorithm Bangla Sentiment Polarity Score (BSPS), an approach capable of generating sentiment scores and classifying reviews into nine distinct sentiment categories. To assess the performance of this method, we evaluated the classified sentiments using BanglaBERT, a pre-trained transformer-based language model. We also performed sentiment classification directly with BanglaBERT on the original data and evaluated this model’s results. Our analysis revealed that the BSPS + BanglaBERT hybrid approach outperformed the standalone BanglaBERT model, achieving higher accuracy, precision, and nuanced classification across the nine sentiment categories. The results of our study emphasize the value and effectiveness of combining rule-based and pre-trained language model approaches for enhanced sentiment analysis in Bengali and suggest pathways for future research and application in languages with similar linguistic complexities.

arxiv情報

著者	Hemal Mahmud,Hasan Mahmud,Mohammad Rifat Ahmmad Rashid
発行日	2025-04-23 17:18:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG | コメントを受け付けていません

Solving Inverse Problems in Protein Space Using Diffusion-Based Priors

投稿日: 2025年4月24日作成者: jarxiv

要約

タンパク質とその環境との相互作用は、その3D構造を介して理解および制御できます。
X線結晶構造や極低温電子顕微鏡などのタンパク質構造決定の実験方法は、生物学的プロセスに光を当てますが、挑戦的な逆問題をもたらします。
学習ベースのアプローチは、3D構造決定のためにこれらの逆問題を解決するための正確で効率的な方法として浮上していますが、事前定義されたタイプの測定に特化しています。
ここでは、Cryo-EM密度マップなどの生物物理学的測定を3D原子モデルに変えるための多用途のフレームワークを導入します。
私たちの方法は、測定プロセスの物理ベースのフォワードモデルと、タスクに依存しないデータ駆動型の事前を提供する前提条件の生成モデルを組み合わせています。
私たちの方法は、線形および非線形の逆問題の後部サンプリングベースラインよりも優れています。
特に、これは、クライオエムマップから原子モデルを精製し、スパース距離マトリックスから原子モデルを構築するための最初の拡散ベースの方法です。

要約(オリジナル)

The interaction of a protein with its environment can be understood and controlled via its 3D structure. Experimental methods for protein structure determination, such as X-ray crystallography or cryogenic electron microscopy, shed light on biological processes but introduce challenging inverse problems. Learning-based approaches have emerged as accurate and efficient methods to solve these inverse problems for 3D structure determination, but are specialized for a predefined type of measurement. Here, we introduce a versatile framework to turn biophysical measurements, such as cryo-EM density maps, into 3D atomic models. Our method combines a physics-based forward model of the measurement process with a pretrained generative model providing a task-agnostic, data-driven prior. Our method outperforms posterior sampling baselines on linear and non-linear inverse problems. In particular, it is the first diffusion-based method for refining atomic models from cryo-EM maps and building atomic models from sparse distance matrices.

arxiv情報

著者	Axel Levy,Eric R. Chan,Sara Fridovich-Keil,Frédéric Poitevin,Ellen D. Zhong,Gordon Wetzstein
発行日	2025-04-23 17:35:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG | コメントを受け付けていません

Application of an attention-based CNN-BiLSTM framework for in vivo two-photon calcium imaging of neuronal ensembles: decoding complex bilateral forelimb movements from unilateral M1

投稿日: 2025年4月24日作成者: jarxiv

要約

マルチスケールの脳ネットワークからの動きなどの解読行動は、神経科学の中心的な目的のままです。
過去数十年にわたり、人工知能と機械学習は、運動機能の根底にある神経メカニズムの解明にますます重要な役割を果たしてきました。
高い空間的および時間的解像度で複雑なニューロンシグナルをキャプチャできる脳監視技術の進歩は、行動デコードのためのより洗練された機械学習モデルの開発と応用を必要とします。
この研究では、in vivo 2光子カルシウムイメージングから得られた信号を使用して、熟練した複雑な前肢の動きを解読するために、注意ベースのCNN-BilStmモデルであるハイブリッドディープラーニングフレームワークを採用しています。
私たちの発見は、同側と対側の前肢の両方の複雑な動きが、片側M1ニューロンのアンサンブルから正確に解読できることを示しています。
これらの結果は、複雑な動きの実行にリンクされたニューロンネットワークアクティビティの時空間的依存性をキャプチャする上で、高度なハイブリッドディープラーニングモデルの有効性を強調しています。

要約(オリジナル)

Decoding behavior, such as movement, from multiscale brain networks remains a central objective in neuroscience. Over the past decades, artificial intelligence and machine learning have played an increasingly significant role in elucidating the neural mechanisms underlying motor function. The advancement of brain-monitoring technologies, capable of capturing complex neuronal signals with high spatial and temporal resolution, necessitates the development and application of more sophisticated machine learning models for behavioral decoding. In this study, we employ a hybrid deep learning framework, an attention-based CNN-BiLSTM model, to decode skilled and complex forelimb movements using signals obtained from in vivo two-photon calcium imaging. Our findings demonstrate that the intricate movements of both ipsilateral and contralateral forelimbs can be accurately decoded from unilateral M1 neuronal ensembles. These results highlight the efficacy of advanced hybrid deep learning models in capturing the spatiotemporal dependencies of neuronal networks activity linked to complex movement execution.

arxiv情報

著者	Ghazal Mirzaee,Jonathan Chang,Shahrzad Latifi
発行日	2025-04-23 17:43:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, q-bio.NC | コメントを受け付けていません

Meta-Learning Online Dynamics Model Adaptation in Off-Road Autonomous Driving

投稿日: 2025年4月24日作成者: jarxiv

要約

高速オフロードの自律運転は、複雑で進化する地形の特性と、地形と車両の相互作用を正確にモデル化することの難しさのために、ユニークな課題を提示します。
モデルベースの制御で使用されるダイナミクスモデルは実際のデータから学ぶことができますが、目に見えない地形に一般化するのに苦労して、リアルタイムの適応を不可欠にします。
カルマンフィルターベースのオンライン適応スキームとメタ学習パラメーターを組み合わせて、これらの課題に対処する新しいフレームワークを提案します。
オフラインメタラーニングは、適応が発生する基底関数と適応パラメーターを最適化し、オンライン適応はモデルベースの制御のためにオンボードダイナミクスモデルをリアルタイムで動的に調整します。
フルスケールの自律オフロード車両での実際のテストを含む広範な実験を通じて、私たちのアプローチを検証し、特に安全性が批判的なシナリオで、この方法が予測の精度、パフォーマンス、安全性指標のベースラインアプローチを上回ることを実証します。
私たちの結果は、メタ学習ダイナミクスモデルの適応の有効性を強調し、多様で目に見えない環境をナビゲートできる信頼できる自律システムの開発を進めています。
ビデオは、https：//youtu.be/cckhhrdrqeaで入手できます

要約(オリジナル)

High-speed off-road autonomous driving presents unique challenges due to complex, evolving terrain characteristics and the difficulty of accurately modeling terrain-vehicle interactions. While dynamics models used in model-based control can be learned from real-world data, they often struggle to generalize to unseen terrain, making real-time adaptation essential. We propose a novel framework that combines a Kalman filter-based online adaptation scheme with meta-learned parameters to address these challenges. Offline meta-learning optimizes the basis functions along which adaptation occurs, as well as the adaptation parameters, while online adaptation dynamically adjusts the onboard dynamics model in real time for model-based control. We validate our approach through extensive experiments, including real-world testing on a full-scale autonomous off-road vehicle, demonstrating that our method outperforms baseline approaches in prediction accuracy, performance, and safety metrics, particularly in safety-critical scenarios. Our results underscore the effectiveness of meta-learned dynamics model adaptation, advancing the development of reliable autonomous systems capable of navigating diverse and unseen environments. Video is available at: https://youtu.be/cCKHHrDRQEA

arxiv情報

著者	Jacob Levy,Jason Gibson,Bogdan Vlahov,Erica Tevere,Evangelos Theodorou,David Fridovich-Keil,Patrick Spieler
発行日	2025-04-23 17:51:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, cs.RO, cs.SY, eess.SY | コメントを受け付けていません

MedMax: Mixed-Modal Instruction Tuning for Training Biomedical Assistants

投稿日: 2025年4月24日作成者: jarxiv

要約

混合モーダル生成の最近の進歩により、生物医学画像を分析し、それらに関する複雑な質問に答え、マルチモーダル患者報告を生成できる統一された生物医学アシスタントを開発するための新しい道が開かれました。
ただし、既存のデータセットは、小型サイズ、生物医学的タスクとドメインの限られたカバレッジ、狭いソースへの依存などの課題に直面しています。
これらのギャップに対処するために、混合モーダルファンデーションモデルの大規模なマルチモーダル生物医学指導チューニングデータセットであるMedMaxを提示します。
147万のインスタンスを備えたMedMaxには、インターリーブ画像テキスト生成、生物医学的画像キャプションと生成、視覚的チャット、レポートの理解など、多様なタスクが含まれます。
これらのタスクは、医療用紙やYouTubeビデオに基づいた放射線学や組織病理学など、多様な生物医学的領域にまたがる知識に及びます。
その後、MedMaxデータセットで混合モーダルファンデーションモデルを微調整し、大幅なパフォーマンスの改善を達成しました。カメレオンモデルよりも26％の増加と、12のダウンストリーム生物医学の視覚的質問タスクにわたってGPT-4Oよりも18.3％の改善です。
最後に、混合モーダル生物医学AIアシスタントの開発を導くために、生物医学タスクの統一された評価スイートを紹介します。
データ、モデル、およびコードは、https：//mint-medmax.github.io/で入手できます。

要約(オリジナル)

Recent advancements in mixed-modal generative have opened new avenues for developing unified biomedical assistants capable of analyzing biomedical images, answering complex questions about them, and generating multimodal patient reports. However, existing datasets face challenges such as small sizes, limited coverage of biomedical tasks and domains, and a reliance on narrow sources. To address these gaps, we present MedMax, a large-scale multimodal biomedical instruction-tuning dataset for mixed-modal foundation models. With 1.47 million instances, MedMax encompasses a diverse range of tasks, including interleaved image-text generation, biomedical image captioning and generation, visual chat, and report understanding. These tasks span knowledge across diverse biomedical domains, including radiology and histopathology, grounded in medical papers and YouTube videos. Subsequently, we fine-tune a mixed-modal foundation model on the MedMax dataset, achieving significant performance improvements: a 26% gain over the Chameleon model and an 18.3% improvement over GPT-4o across 12 downstream biomedical visual question-answering tasks. Finally, we introduce a unified evaluation suite for biomedical tasks to guide the development of mixed-modal biomedical AI assistants. The data, model, and code is available at https://mint-medmax.github.io/.

arxiv情報

著者	Hritik Bansal,Daniel Israel,Siyan Zhao,Shufan Li,Tung Nguyen,Aditya Grover
発行日	2025-04-23 06:29:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

ITERTL: An Iterative Framework for Fine-tuning LLMs for RTL Code Generation

投稿日: 2025年4月24日作成者: jarxiv

要約

最近、大規模な言語モデル（LLMS）は優れたパフォーマンスを実証し、研究者がレジスタ転送レベル（RTL）コード生成の自動化における使用とハードウェアの設計効率の向上においての使用を探求することを促しました。
ただし、RTL生成のLLMSを微調整する既存のアプローチは、通常、固定データセットで実施されます。これは、LLMの機能を完全に刺激せず、大量の参照データを必要とします。
これらの問題を緩和するために、Itertlという名前の反復トレーニングパラダイムを革新的に導入します。
各反復中に、前のサイクルで訓練されたモデルからサンプルが引き出されます。
次に、これらの新しいサンプルが現在のループでのトレーニングに採用されます。
さらに、プラグアンドプレイデータフィルタリング戦略を導入するため、高品質の自己完結型コードを生成するようにモデルが促進されます。
私たちのモデルは、GPT4と最先端の（SOTA）オープンソースモデルよりも優れており、Verilogeval-Humanベンチマークで53.8％パス@1レートを達成しています。
データの量と品質の同様の条件下では、私たちのアプローチはベースラインを大幅に上回ります。
広範な実験では、提案された方法の有効性を検証します。

要約(オリジナル)

Recently, large language models (LLMs) have demonstrated excellent performance, inspiring researchers to explore their use in automating register transfer level (RTL) code generation and improving hardware design efficiency. However, the existing approaches to fine-tune LLMs for RTL generation typically are conducted on fixed datasets, which do not fully stimulate the capability of LLMs and require large amounts of reference data, which are costly to acquire. To mitigate these issues, we innovatively introduce an iterative training paradigm named ITERTL. During each iteration, samples are drawn from the model trained in the previous cycle. Then these new samples are employed for training in current loop. Furthermore, we introduce a plug-and-play data filtering strategy, thereby encouraging the model to generate high-quality, self-contained code. Our model outperforms GPT4 and state-of-the-art (SOTA) open-source models, achieving remarkable 53.8% pass@1 rate on VerilogEval-human benchmark. Under similar conditions of data quantity and quality, our approach significantly outperforms the baseline. Extensive experiments validate the effectiveness of the proposed method.

arxiv情報

著者	Peiyang Wu,Nan Guo,Xiao Xiao,Wenming Li,Xiaochun Ye,Dongrui Fan
発行日	2025-04-23 06:56:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test

投稿日: 2025年4月24日作成者: jarxiv

要約

最新のLLMの順次性質により、それらは高価で遅くなり、投機的なサンプリングはこの問題の効果的な解決策であることが証明されています。
イーグルなどの方法は、機能レベルで自己網目上を実行し、ターゲットモデルのトップレイヤー機能を再利用して、バニラの投機的サンプリングよりも良い結果を達成します。
LLMコミュニティの成長傾向は、推論コストを増やすことなくモデルインテリジェンスを改善するためにトレーニングデータを拡大することです。
ただし、データをスケーリングすることで、イーグルの改善が限られていることがわかります。
この制限は、Eagleの機能予測の制約から生じることを特定します。
このホワイトペーパーでは、Eagle-3を紹介します。これは、トレーニング時間テストという名前のテクニックを介して、直接トークン予測を支持してフィーチャーフィーチャー予測を導入し、トップレイヤー機能への依存をマルチレイヤー機能融合に置き換えます。
これらの改善により、パフォーマンスが大幅に向上し、ドラフトモデルがトレーニングデータの拡大から完全に恩恵を受けることができます。
実験には、5つのタスクで評価されたチャットモデルと推論モデルの両方が含まれます。
結果は、Eagle-3が最大6.5倍までのスピードアップ比を達成し、Eagle-2よりも約1.4倍改善したことを示しています。
Sglangフレームワークでは、Eagle-3は64のバッチサイズで1.38倍のスループット改善を達成します。コードはhttps://github.com/safeailab/eagleで入手できます。

要約(オリジナル)

The sequential nature of modern LLMs makes them expensive and slow, and speculative sampling has proven to be an effective solution to this problem. Methods like EAGLE perform autoregression at the feature level, reusing top-layer features from the target model to achieve better results than vanilla speculative sampling. A growing trend in the LLM community is scaling up training data to improve model intelligence without increasing inference costs. However, we observe that scaling up data provides limited improvements for EAGLE. We identify that this limitation arises from EAGLE’s feature prediction constraints. In this paper, we introduce EAGLE-3, which abandons feature prediction in favor of direct token prediction and replaces reliance on top-layer features with multi-layer feature fusion via a technique named training-time test. These improvements significantly enhance performance and enable the draft model to fully benefit from scaling up training data. Our experiments include both chat models and reasoning models, evaluated on five tasks. The results show that EAGLE-3 achieves a speedup ratio up to 6.5x, with about 1.4x improvement over EAGLE-2. In the SGLang framework, EAGLE-3 achieves a 1.38x throughput improvement at a batch size of 64. The code is available at https://github.com/SafeAILab/EAGLE.

arxiv情報

著者	Yuhui Li,Fangyun Wei,Chao Zhang,Hongyang Zhang
発行日	2025-04-23 07:08:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL | コメントを受け付けていません

Large Language Models for Automated Literature Review: An Evaluation of Reference Generation, Abstract Writing, and Review Composition

投稿日: 2025年4月24日作成者: jarxiv

要約

大規模な言語モデル（LLM）は、文献収集、組織、要約などの文献レビューを書くことに関与する複雑なプロセスを自動化する潜在的なソリューションとして浮上しています。
ただし、LLMが包括的で信頼できる文献レビューの自動化にどれほど優れているかはまだ不明です。
この研究では、参照生成、文献の要約、文献レビュー構成という3つの重要な文献執筆タスクでLLMSのパフォーマンスを自動的に評価するフレームワークを紹介します。
生成された参照の幻覚率を評価する多次元評価メトリックを導入し、人間が書いた対応物に対する文献の要約と構成のセマンティックカバレッジと事実の一貫性を測定します。
実験結果は、最近の進歩にもかかわらず、最も先進的なモデルでさえも幻覚的な参照を生成することが明らかになりました。
さらに、文献のレビューを書くことに関しては、異なるモデルのパフォーマンスが分野間で異なることを観察します。
これらの調査結果は、学術文献レビューの自動化におけるLLMの信頼性を改善するためのさらなる研究開発の必要性を強調しています。

要約(オリジナル)

Large language models (LLMs) have emerged as a potential solution to automate the complex processes involved in writing literature reviews, such as literature collection, organization, and summarization. However, it is yet unclear how good LLMs are at automating comprehensive and reliable literature reviews. This study introduces a framework to automatically evaluate the performance of LLMs in three key tasks of literature writing: reference generation, literature summary, and literature review composition. We introduce multidimensional evaluation metrics that assess the hallucination rates in generated references and measure the semantic coverage and factual consistency of the literature summaries and compositions against human-written counterparts. The experimental results reveal that even the most advanced models still generate hallucinated references, despite recent progress. Moreover, we observe that the performance of different models varies across disciplines when it comes to writing literature reviews. These findings highlight the need for further research and development to improve the reliability of LLMs in automating academic literature reviews.

arxiv情報

著者	Xuemei Tang,Xufeng Duan,Zhenguang G. Cai
発行日	2025-04-23 07:09:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

T-VEC: A Telecom-Specific Vectorization Model with Enhanced Semantic Understanding via Deep Triplet Loss Fine-Tuning

投稿日: 2025年4月24日作成者: jarxiv

要約

電気通信業界の専門的な語彙と複雑な概念は、標準的な自然言語処理モデルに大きな課題を提示しています。
一般的なテキストの埋め込みは、多くの場合、テレコム固有のセマンティクスをキャプチャできず、ダウンストリームタスクのパフォーマンスを妨げます。
T-VEC（Telecom Vectorization Model）を紹介します。これは、深い微調整を通じてテレコムドメイン向けに調整された新しい埋め込みモデルです。
NetoAIによって開発されたT-VECは、テレコム固有のデータの細心の激しくキュレーションされた大規模なデータセットでトリプレット損失目標を使用して、最先端のGTE-QWEN2-1.5B-Instructモデルを適応させることにより作成されます。
重要なことに、このプロセスには、ベースモデルの338層にわたる重みの大幅な変更が含まれ、ドメイン知識の深い統合を確保し、表面的な適応技術をはるかに超えていました。
体重差分析を介してこの深い変化を定量化します。
重要な貢献は、最初の専用テレコム固有のトークン剤の開発とオープンソーシング（MITライセンス）であり、業界用語の処理を強化します。
T-VECは、確立されたモデルと比較して主要な平均MTEBスコア（0.825）を達成し、内部のテレコム固有のトリプレット評価ベンチマークで非常に優れたパフォーマンス（0.9380対0.07未満）を実証し、埋め込みの分離によって視覚的に整理されたドメイン固有のニュアンスの例外的な把握を示しています。
この作業は、テレコムAIイノベーションの最前線にNetoaiを配置し、コミュニティに強力で深く適応したオープンソースツールを提供します。

要約(オリジナル)

The specialized vocabulary and complex concepts of the telecommunications industry present significant challenges for standard Natural Language Processing models. Generic text embeddings often fail to capture telecom-specific semantics, hindering downstream task performance. We introduce T-VEC (Telecom Vectorization Model), a novel embedding model tailored for the telecom domain through deep fine-tuning. Developed by NetoAI, T-VEC is created by adapting the state-of-the-art gte-Qwen2-1.5B-instruct model using a triplet loss objective on a meticulously curated, large-scale dataset of telecom-specific data. Crucially, this process involved substantial modification of weights across 338 layers of the base model, ensuring deep integration of domain knowledge, far exceeding superficial adaptation techniques. We quantify this deep change via weight difference analysis. A key contribution is the development and open-sourcing (MIT License) of the first dedicated telecom-specific tokenizer, enhancing the handling of industry jargon. T-VEC achieves a leading average MTEB score (0.825) compared to established models and demonstrates vastly superior performance (0.9380 vs. less than 0.07) on our internal telecom-specific triplet evaluation benchmark, indicating an exceptional grasp of domain-specific nuances, visually confirmed by improved embedding separation. This work positions NetoAI at the forefront of telecom AI innovation, providing the community with a powerful, deeply adapted, open-source tool.

arxiv情報

著者	Vignesh Ethiraj,Sidhanth Menon,Divya Vijay
発行日	2025-04-23 07:10:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: 68T50, cs.AI, cs.CL | コメントを受け付けていません

Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?

投稿日: 2025年4月24日作成者: jarxiv

要約

大規模な言語モデル（LLM）のコンテキスト制限が増加すると、可能なアプリケーションとダウンストリーム関数の範囲が広がります。
多くの現実世界のタスクでは、決定は、ほとんど無関係な情報を含むしばしば異なる文書のコレクションに散らばる詳細に依存します。
ロングコンテキストLLMは、この形式の複雑な情報検索と推論に適しているように見えます。これは、従来、費用がかかり、時間がかかります。
ただし、長いコンテキストモデルの開発により、近年急速に利益が得られていますが、LLMがコンテキストをどのように効果的に使用するかについての理解はペースを維持していません。
これに対処するために、コンテキストウィンドウを介して情報のスレッドをたどる能力など、17の主要なLLMの機能を評価するために設計された一連の検索実験を実施します。
驚くべきことに、多くのモデルが非常にスレッドセーフであることがわかります。パフォーマンスが大幅に失われることなく、複数のスレッドを同時に追跡できることです。
それでも、多くのモデルでは、有効なコンテキスト制限は、サポートされているコンテキストの長さよりも大幅に短く、コンテキストウィンドウが増えるにつれて精度が低下します。
また、私たちの研究は、異なるトークンザーからのトークンカウントを直接比較すべきではないという重要な点を強調しています。それらは、多くの場合、かなり異なる数の文字に対応しています。
コードとロングコンテキストの実験データをリリースします。

要約(オリジナル)

As the context limits of Large Language Models (LLMs) increase, the range of possible applications and downstream functions broadens. In many real-world tasks, decisions depend on details scattered across collections of often disparate documents containing mostly irrelevant information. Long-context LLMs appear well-suited to this form of complex information retrieval and reasoning, which has traditionally proven costly and time-consuming. However, although the development of longer context models has seen rapid gains in recent years, our understanding of how effectively LLMs use their context has not kept pace. To address this, we conduct a set of retrieval experiments designed to evaluate the capabilities of 17 leading LLMs, such as their ability to follow threads of information through the context window. Strikingly, we find that many models are remarkably threadsafe: capable of simultaneously following multiple threads without significant loss in performance. Still, for many models, we find the effective context limit is significantly shorter than the supported context length, with accuracy decreasing as the context window grows. Our study also highlights the important point that token counts from different tokenizers should not be directly compared — they often correspond to substantially different numbers of written characters. We release our code and long-context experimental data.

arxiv情報

著者	Jonathan Roberts,Kai Han,Samuel Albanie
発行日	2025-04-23 07:50:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント