Granular IoT Device Identification Using TF-IDF and Cosine Similarity

Ash Andrews, George Oikonomou, Simon M D Armour, Paul Thomas, Thomas Cattermole

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)


Internet of things (IoT) devices are becoming more prevalent in home environments and are shown to be generally insecure. There have been many previous studies looking to identify unknown IoT devices on networks. To truly secure a network however, there is a need to identify unknown devices down to the granularity of firmware version; a problem previous studies have failed to solve. As devices change versions, it is expected that there would be subtle differences in the on-wire signatures that would be hard for a human analyst to notice, but easy for an NLP technique to identify. In this paper we extract keywords from both encrypted and unencrypted network traffic and first use UMAP with K-Means clustering to visualise the data and show that natural clusters form across our test dataset of 18 devices covering 61 versions. This analysis suggests that there are underlying patterns in the extracted keywords that could be detected by machine learning techniques. We then show that these patterns can be detected by proposing a novel technique using TF-IDF and cosine similarity that follows the clustering results to identify IoT devices down to the level of firmware version. We show that our chosen features are strong enough to work accurately across a range of device types, manufacturers, models and versions, and note the main observations found when trying to identify devices down to a firmware version. This approach to get granularity down to device version level achieves an accuracy of 67% without being to the detriment of identifying device models, where we achieve an accuracy of 90%.
Original languageEnglish
Title of host publicationCPSIoTSec '23
Subtitle of host publicationProceedings of the 5th Workshop on CPS&IoT Security and Privacy
PublisherAssociation for Computing Machinery (ACM)
Number of pages9
ISBN (Electronic)9798400702549
ISBN (Print)9798400702549
Publication statusPublished - 26 Nov 2023
Event5th Joint Workshop on CPS & IoT Security and Privacy - Tivoli Congress Center, Copenhagen, Denmark
Duration: 26 Nov 202326 Nov 2023


Workshop5th Joint Workshop on CPS & IoT Security and Privacy
Abbreviated titleCPSIoTSec 2023
Internet address

Bibliographical note

Publisher Copyright:
© 2023 Owner/Author.


Dive into the research topics of 'Granular IoT Device Identification Using TF-IDF and Cosine Similarity'. Together they form a unique fingerprint.

Cite this