Selective Attention Improves Transformer




Unneeded elements in the attention’s context degrade performance. We introduce Selective Attention, a simple parameter-free change to the standard attention mechanism which reduces attention to unneeded elements. Selective attention improves language modeling performance in a variety of model sizes and context lengths. For example, a range of transformers trained with the language modeling objective on C4 with selective attention perform equivalently to standard transformers with ~2X more heads and parameters in their attention modules. Selective attention also allows decreasing the size of the attention’s context buffer, leading to meaningful reductions in the memory and compute requirements during inference. For example, transformers with 100M parameters trained on C4 with context sizes of 512, 1,024, and 2,048 need 16X, 25X, and 47X less memory for their attention module, respectively, when equipped with selective attention, as those without selective attention, with the same validation perplexity.


著者 Yaniv Leviathan,Matan Kalman,Yossi Matias
発行日 2024-10-03 17:27:30+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, DeepL

カテゴリー: cs.AI, cs.CL, cs.LG | コメントする

LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations




Large language models (LLMs) often produce errors, including factual inaccuracies, biases, and reasoning failures, collectively referred to as ‘hallucinations’. Recent studies have demonstrated that LLMs’ internal states encode information regarding the truthfulness of their outputs, and that this information can be utilized to detect errors. In this work, we show that the internal representations of LLMs encode much more information about truthfulness than previously recognized. We first discover that the truthfulness information is concentrated in specific tokens, and leveraging this property significantly enhances error detection performance. Yet, we show that such error detectors fail to generalize across datasets, implying that — contrary to prior claims — truthfulness encoding is not universal but rather multifaceted. Next, we show that internal representations can also be used for predicting the types of errors the model is likely to make, facilitating the development of tailored mitigation strategies. Lastly, we reveal a discrepancy between LLMs’ internal encoding and external behavior: they may encode the correct answer, yet consistently generate an incorrect one. Taken together, these insights deepen our understanding of LLM errors from the model’s internal perspective, which can guide future research on enhancing error analysis and mitigation.


著者 Hadas Orgad,Michael Toker,Zorik Gekhman,Roi Reichart,Idan Szpektor,Hadas Kotek,Yonatan Belinkov
発行日 2024-10-03 17:31:31+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, DeepL

カテゴリー: 68T50, cs.AI, cs.CL, I.2.7 | コメントする

SteerDiff: Steering towards Safe Text-to-Image Diffusion Models




Text-to-image (T2I) diffusion models have drawn attention for their ability to generate high-quality images with precise text alignment. However, these models can also be misused to produce inappropriate content. Existing safety measures, which typically rely on text classifiers or ControlNet-like approaches, are often insufficient. Traditional text classifiers rely on large-scale labeled datasets and can be easily bypassed by rephrasing. As diffusion models continue to scale, fine-tuning these safeguards becomes increasingly challenging and lacks flexibility. Recent red-teaming attack researches further underscore the need for a new paradigm to prevent the generation of inappropriate content. In this paper, we introduce SteerDiff, a lightweight adaptor module designed to act as an intermediary between user input and the diffusion model, ensuring that generated images adhere to ethical and safety standards with little to no impact on usability. SteerDiff identifies and manipulates inappropriate concepts within the text embedding space to guide the model away from harmful outputs. We conduct extensive experiments across various concept unlearning tasks to evaluate the effectiveness of our approach. Furthermore, we benchmark SteerDiff against multiple red-teaming strategies to assess its robustness. Finally, we explore the potential of SteerDiff for concept forgetting tasks, demonstrating its versatility in text-conditioned image generation.


著者 Hongxiang Zhang,Yifeng He,Hao Chen
発行日 2024-10-03 17:34:55+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, DeepL

カテゴリー: cs.AI, cs.CR, cs.CV | コメントする

Measurements with Noise: Bayesian Optimization for Co-optimizing Noise and Property Discovery in Automated Experiments




We have developed a Bayesian optimization (BO) workflow that integrates intra-step noise optimization into automated experimental cycles. Traditional BO approaches in automated experiments focus on optimizing experimental trajectories but often overlook the impact of measurement noise on data quality and cost. Our proposed framework simultaneously optimizes both the target property and the associated measurement noise by introducing time as an additional input parameter, thereby balancing the signal-to-noise ratio and experimental duration. Two approaches are explored: a reward-driven noise optimization and a double-optimization acquisition function, both enhancing the efficiency of automated workflows by considering noise and cost within the optimization process. We validate our method through simulations and real-world experiments using Piezoresponse Force Microscopy (PFM), demonstrating the successful optimization of measurement duration and property exploration. Our approach offers a scalable solution for optimizing multiple variables in automated experimental workflows, improving data quality, and reducing resource expenditure in materials science and beyond.


著者 Boris N. Slautin,Yu Liu,Jan Dec,Vladimir V. Shvartsman,Doru C. Lupascu,Maxim Ziatdinov,Sergei V. Kalinin
発行日 2024-10-03 17:38:43+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, DeepL

カテゴリー: cond-mat.mtrl-sci, cs.AI, cs.LG | コメントする

Curvature Diversity-Driven Deformation and Domain Alignment for Point Cloud




Unsupervised Domain Adaptation (UDA) is crucial for reducing the need for extensive manual data annotation when training deep networks on point cloud data. A significant challenge of UDA lies in effectively bridging the domain gap. To tackle this challenge, we propose \textbf{C}urvature \textbf{D}iversity-Driven \textbf{N}uclear-Norm Wasserstein \textbf{D}omain Alignment (CDND). Our approach first introduces a \textit{\textbf{Curv}ature Diversity-driven Deformation \textbf{Rec}onstruction (CurvRec)} task, which effectively mitigates the gap between the source and target domains by enabling the model to extract salient features from semantically rich regions of a given point cloud. We then propose \textit{\textbf{D}eformation-based \textbf{N}uclear-norm \textbf{W}asserstein \textbf{D}iscrepancy (D-NWD)}, which applies the Nuclear-norm Wasserstein Discrepancy to both \textit{deformed and original} data samples to align the source and target domains. Furthermore, we contribute a theoretical justification for the effectiveness of D-NWD in distribution alignment and demonstrate that it is \textit{generic} enough to be applied to \textbf{any} deformations. To validate our method, we conduct extensive experiments on two public domain adaptation datasets for point cloud classification and segmentation tasks. Empirical experiment results show that our CDND achieves state-of-the-art performance by a noticeable margin over existing approaches.


著者 Mengxi Wu,Hao Huang,Yi Fang,Mohammad Rostami
発行日 2024-10-03 17:39:55+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, DeepL

カテゴリー: cs.AI, cs.CV | コメントする

Domain-Specific Retrieval-Augmented Generation Using Vector Stores, Knowledge Graphs, and Tensor Factorization




Large Language Models (LLMs) are pre-trained on large-scale corpora and excel in numerous general natural language processing (NLP) tasks, such as question answering (QA). Despite their advanced language capabilities, when it comes to domain-specific and knowledge-intensive tasks, LLMs suffer from hallucinations, knowledge cut-offs, and lack of knowledge attributions. Additionally, fine tuning LLMs’ intrinsic knowledge to highly specific domains is an expensive and time consuming process. The retrieval-augmented generation (RAG) process has recently emerged as a method capable of optimization of LLM responses, by referencing them to a predetermined ontology. It was shown that using a Knowledge Graph (KG) ontology for RAG improves the QA accuracy, by taking into account relevant sub-graphs that preserve the information in a structured manner. In this paper, we introduce SMART-SLIC, a highly domain-specific LLM framework, that integrates RAG with KG and a vector store (VS) that store factual domain specific information. Importantly, to avoid hallucinations in the KG, we build these highly domain-specific KGs and VSs without the use of LLMs, but via NLP, data mining, and nonnegative tensor factorization with automatic model selection. Pairing our RAG with a domain-specific: (i) KG (containing structured information), and (ii) VS (containing unstructured information) enables the development of domain-specific chat-bots that attribute the source of information, mitigate hallucinations, lessen the need for fine-tuning, and excel in highly domain-specific question answering tasks. We pair SMART-SLIC with chain-of-thought prompting agents. The framework is designed to be generalizable to adapt to any specific or specialized domain. In this paper, we demonstrate the question answering capabilities of our framework on a corpus of scientific publications on malware analysis and anomaly detection.


著者 Ryan C. Barron,Ves Grantcharov,Selma Wanna,Maksim E. Eren,Manish Bhattarai,Nicholas Solovyev,George Tompkins,Charles Nicholas,Kim Ø. Rasmussen,Cynthia Matuszek,Boian S. Alexandrov
発行日 2024-10-03 17:40:55+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, DeepL

カテゴリー: cs.AI, cs.CL, cs.IR, cs.SE | コメントする

Large Language Models as Markov Chains




Large language models (LLMs) have proven to be remarkably efficient, both across a wide range of natural language processing tasks and well beyond them. However, a comprehensive theoretical analysis of the origins of their impressive performance remains elusive. In this paper, we approach this challenging task by drawing an equivalence between generic autoregressive language models with vocabulary of size $T$ and context window of size $K$ and Markov chains defined on a finite state space of size $\mathcal{O}(T^K)$. We derive several surprising findings related to the existence of a stationary distribution of Markov chains that capture the inference power of LLMs, their speed of convergence to it, and the influence of the temperature on the latter. We then prove pre-training and in-context generalization bounds and show how the drawn equivalence allows us to enrich their interpretation. Finally, we illustrate our theoretical guarantees with experiments on several recent LLMs to highlight how they capture the behavior observed in practice.


著者 Oussama Zekri,Ambroise Odonnat,Abdelhakim Benechehab,Linus Bleistein,Nicolas Boullé,Ievgen Redko
発行日 2024-10-03 17:45:31+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, DeepL

カテゴリー: cs.AI, cs.CL, cs.LG, stat.ML | コメントする

Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation


推論時間計算は、大規模言語モデル(LLM)の性能を向上させるための強力なパラダイムであり、Best-of-Nサンプリングは広く用いられている手法である。しかし、この方法は計算コストが高く、(1)外部の報酬モデルと(2)複数のサンプルの生成が必要である。本研究では、性能を維持あるいは向上させながら、生成サンプル数を適応的に削減するように設計された、新しい生成的自己評価スキームを紹介する。我々は生成的報酬モデル定式化を用い、LLMが世代途中で、世代を再スタートした方がより良い応答が得られる確率を予測することを可能にする。これらの予測は外部の報酬モデルなしで得られ、より多くのサンプルを生成するかどうか、有望でないサンプルを早い段階で刈り取るかどうか、あるいは最良のサンプルを選ぶかどうかを決定するために使用できる。この機能は、定義済みのトークンを1つ生成するだけなので、非常に安価である。フィルタリングされていない実際のLMSYSユーザープロンプトで構築されたデータセットを使用してトレーニングした結果、Llama 3.1 8BのGPT-4に対する勝率は、16サンプルで21%から34%に向上し、GSM8Kの数学性能は84%から91%に向上しました。LLMが有益と判断した場合にのみサンプリングを行い、温度アニーリングを適応的に調整することで、16サンプルの使用による改善の74%を平均1.2サンプルのみで達成できることを実証しました。さらに、50~75%のサンプルは生成の初期段階で切り捨てることができ、性能の低下は最小限であることを示す。全体として、我々の手法はLLMの推論において、より効率的でスケーラブルな計算利用を可能にする。


Inference-time computation is a powerful paradigm to enhance the performance of large language models (LLMs), with Best-of-N sampling being a widely used technique. However, this method is computationally expensive, requiring both (1) an external reward model and (2) the generation of multiple samples. In this work, we introduce a new generative self-evaluation scheme designed to adaptively reduce the number of generated samples while maintaining or even improving performance. We use a generative reward model formulation, allowing the LLM to predict mid-generation the probability that restarting the generation will yield a better response. These predictions are obtained without an external reward model and can be used to decide whether or not to generate more samples, prune unpromising samples early on, or to pick the best sample. This capability is very inexpensive as it involves generating a single predefined token. Trained using a dataset constructed with real unfiltered LMSYS user prompts, Llama 3.1 8B’s win rate against GPT-4 on AlpacaEval increases from 21% to 34% with 16 samples and math performance on GSM8K improves from 84% to 91%. By sampling only when the LLM determines that it is beneficial to do so and adaptively adjusting temperature annealing, we demonstrate that 74% of the improvement from using 16 samples can be achieved with only 1.2 samples on average. We further demonstrate that 50-75% of samples can be pruned early in generation with minimal degradation in performance. Overall, our methods enable more efficient and scalable compute utilization during inference for LLMs.


著者 Rohin Manvi,Anikait Singh,Stefano Ermon
発行日 2024-10-03 17:47:29+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, DeepL

カテゴリー: cs.AI, cs.CL, cs.LG | コメントする

Unified Multi-Modal Interleaved Document Representation for Information Retrieval




Information Retrieval (IR) methods aim to identify relevant documents in response to a given query, which have gained remarkable attention due to their successful application in various natural language tasks. However, existing approaches typically consider only the textual information within the documents, which overlooks the fact that documents can contain multiple modalities, including texts, images, and tables. Further, they often segment each long document into multiple discrete passages for embedding, preventing them from capturing the overall document context and interactions between paragraphs. We argue that these two limitations lead to suboptimal document representations for retrieval. In this work, to address them, we aim to produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities. Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation. Moreover, to mitigate the information loss from segmenting documents into passages, instead of representing and retrieving passages individually, we further merge the representations of segmented passages into one single document representation, while we additionally introduce a reranking strategy to decouple and identify the relevant passage within the document if necessary. Then, through extensive experiments on diverse information retrieval scenarios considering both the textual and multimodal queries, we show that our approach substantially outperforms relevant baselines, thanks to the consideration of the multimodal information interleaved within the documents in a unified way.


著者 Jaewoo Lee,Joonho Ko,Jinheon Baek,Soyeong Jeong,Sung Ju Hwang
発行日 2024-10-03 17:49:09+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, DeepL

カテゴリー: cs.AI, cs.CL, cs.IR | コメントする

Custom Non-Linear Model Predictive Control for Obstacle Avoidance in Indoor and Outdoor Environments


複雑な環境を航行するには、無人航空機(UAV)や自律システムがリアルタイムで軌道追跡や障害物回避を行う必要がある。多くの制御戦略は線形近似を効果的に利用してきたが、UAVの非線形ダイナミクス、特に障害物が密集した環境での非線形ダイナミクスへの対処は、さらなる研究を必要とする重要な課題である。本論文では、DJI Matrice 100のための非線形モデル予測制御(NMPC)フレームワークを紹介し、動的モデルとBスプライン補間を使用して滑らかな基準軌道を実現し、安全制約を尊重しながら最小偏差を確保することで、これらの課題に取り組みます。このフレームワークは様々な軌道タイプをサポートし、タイトな操縦における制御精度のためにペナルティベースのコスト関数を採用しています。このフレームワークでは、CasADiを利用して効率的なリアルタイム最適化を行うことで、厳しい計算制約下でもUAVのロバストな動作を維持することができる。シミュレーションと実際の屋内外実験により、NMPCの外乱適応能力が実証され、衝突のないスムーズなナビゲーションが実現した。


Navigating complex environments requires Unmanned Aerial Vehicles (UAVs) and autonomous systems to perform trajectory tracking and obstacle avoidance in real-time. While many control strategies have effectively utilized linear approximations, addressing the non-linear dynamics of UAV, especially in obstacle-dense environments, remains a key challenge that requires further research. This paper introduces a Non-linear Model Predictive Control (NMPC) framework for the DJI Matrice 100, addressing these challenges by using a dynamic model and B-spline interpolation for smooth reference trajectories, ensuring minimal deviation while respecting safety constraints. The framework supports various trajectory types and employs a penalty-based cost function for control accuracy in tight maneuvers. The framework utilizes CasADi for efficient real-time optimization, enabling the UAV to maintain robust operation even under tight computational constraints. Simulation and real-world indoor and outdoor experiments demonstrated the NMPC ability to adapt to disturbances, resulting in smooth, collision-free navigation.


著者 Lara Laban,Mariusz Wzorek,Piotr Rudol,Tommy Persson
発行日 2024-10-03 17:50:19+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, DeepL

カテゴリー: (Primary), 68T40, 93B52, C.4, cs.AI, cs.AR, cs.CE, cs.RO, cs.SY, eess.SY | コメントする