Quantifying Semantic Similarity Across Languages

Bill Thompson, Sean G Roberts, Gary Lupyan

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

43 Downloads (Pure)


Do all languages convey semantic knowledge in the same way? If language simply mirrors the structure of the world, the answer should be a qualified “yes”. If, however, languages impose structure as much as reflecting it, then even ostensibly the “same” word in different languages may mean quite different things. We provide a first pass at a large-scale quantification of cross-linguistic semantic alignment of approximately 1000 meanings in 55 languages. We find that the translation equivalents in some domains (e.g., Time, Quantity, and Kinship) exhibit high alignment across languages while the structure of other domains (e.g., Politics, Food, Emotions, and Animals) exhibits substantial cross-linguistic variability. Our measure of semantic alignment correlates with known phylogenetic distances between languages: more phylogenetically distant languages have less semantic alignment. We also find semantic alignment to correlate with cultural distances between societies speaking the languages, suggesting a rich co-adaptation of language and culture even in domains of experience that appear most constrained by the natural world.
Original languageEnglish
Title of host publicationProceedings of the 40th Annual Conference of the Cognitive Science Society
Subtitle of host publicationCogSci 2018
PublisherCognitive Science Society
Number of pages6
ISBN (Print)978-0-9911967-8-4
Publication statusPublished - 28 Jul 2018
Event Annual Meeting of the Cognitive Science Society - Madison, United States
Duration: 25 Jul 201828 Jul 2018
Conference number: 40


Conference Annual Meeting of the Cognitive Science Society
Abbreviated titleCogSci2018
CountryUnited States
Internet address


  • word meanings
  • distributional semantics
  • word2vec
  • culture
  • language
  • relativity

Fingerprint Dive into the research topics of 'Quantifying Semantic Similarity Across Languages'. Together they form a unique fingerprint.

Cite this