Testing Heaps' law for cities using administrative and gridded population data sets

Filippo Simini*, Charlotte James

*Corresponding author for this work

Research output: Contribution to journalArticle (Academic Journal)peer-review

7 Citations (Scopus)
164 Downloads (Pure)


Since 2008 the number of individuals living in urban areas has surpassed that of rural areas and in the next decades urbanisation is expected to further increase, especially in developing countries. A country’s urbanisation depends both on the distribution of city sizes, describing the fraction of cities with a given population (or area), and the overall number of cities in the country. Here we present empirical evidence suggesting the validity of Heaps’ law for cities: the expected number of cities in a country is only a function of the country’s total population (or built-up area) and the distribution of city sizes. This implies the absence of correlations in the spatial distribution of cities. We show that this result holds at the country scale using the official administrative definition of cities provided by the Geonames dataset, as well as at the local scale, for areas of 128 × 128 km2 in the United States, using a morphological definition of urban clusters obtained from the Global Rural-Urban Mapping Project (GRUMP) dataset. We also derive a general theoretical result applicable to all systems characterised by a Zipf distribution of group sizes, which describes the relationship between the expected number of groups (cities) and the total number of elements in all groups (population), providing further insights on the relationship between Zipf’s law and Heaps’ law for finite-size systems.

Original languageEnglish
Article number24
Number of pages13
JournalEPJ Data Science
Issue number24
Early online date5 Jul 2019
Publication statusPublished - 1 Dec 2019


  • Cities
  • Heaps’ law
  • Scaling
  • Urbanisation
  • Zipf’s law


Dive into the research topics of 'Testing Heaps' law for cities using administrative and gridded population data sets'. Together they form a unique fingerprint.

Cite this