Language Models Do Not Embed Numbers Continuously

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

Abstract

Recent research has extensively studied how large language models manipulate integers in specific arithmetic tasks, and on a more fundamental level, how they represent numeric values. These previous works have found that language model embeddings can be used to reconstruct the original values, however, they do not evaluate whether language models actually model continuous values as continuous. Using expected properties of the embedding space, including linear reconstruction and principal component analysis, we show that language models not only represent numeric spaces as non-continuous but also introduce significant noise. Using models from three major providers (OpenAI, Google Gemini and Voyage AI), we show that while reconstruction is possible with high fidelity (R2 ≥ 0.95), principal components only explain a minor share of variation within the embed- ding space. This indicates that many components within the embedding space are orthogonal to the simple numeric in- put space. Further, both linear reconstruction and explained variance suffer with increasing decimal precision, despite the ordinal nature of the input space being fundamentally unchanged. The findings of this work therefore have implications for the many areas where embedding models are used, in-particular where high numerical precision, large magnitudes or mixed-sign values are common.
Original languageEnglish
Title of host publicationProceedings of the 2nd AAAI Workshop on XAI4Science
Subtitle of host publicationFrom Understanding Model Behavior to Discovering New Scientific Knowledge
Place of PublicationSingapore
Publisheropenreview.net
Pages1-8
Number of pages8
Publication statusAccepted/In press - 20 Jan 2026
EventAAAI Conference on Artificial Intelligence - Singapore EXPO, Singapore, Singapore
Duration: 20 Jan 202627 Jan 2026
Conference number: 40
https://aaai.org/conference/aaai/aaai-26/

Conference

ConferenceAAAI Conference on Artificial Intelligence
Abbreviated titleAAAI 2026
Country/TerritorySingapore
CitySingapore
Period20/01/2627/01/26
Internet address

Fingerprint

Dive into the research topics of 'Language Models Do Not Embed Numbers Continuously'. Together they form a unique fingerprint.

Cite this