Cluster Analysis of Traffic Flows on a Campus Network

A. Karim, I. Ahmad, S.I. Jami, and M. Sarwar (Pakistan)


Data Mining, Clustering, Network Traffic Analysis, IP Addresses


Large quantities of network traffic flow data are generated on university campus networks. These data contain information on the sources and destinations of individual flows encoded as IP addresses. The cluster analysis of such data can reveal useful knowledge for web cache designing, user profiling, and network resource management. However, popular clustering algorithms such as k-means and DBSCAN are not directly applicable to datasets containing IP addresses. Moreover, such generic algorithms can yield results that are difficult to interpret. This paper presents the cluster analysis of network traffic flows using a hybrid clustering algorithm. The algorithm integrates the longest prefix matching concept of TCP/IP traffic routing and the nearest neighbor algorithm. The similarity between IP addresses is determined by the longest prefix match. Similar IP addresses are then grouped together by an adapted version of the nearest neighbor algorithm. The algorithm provides automatic clustering that does not require input parameters such as the desired number of clusters and similarity threshold value. Furthermore, the algorithm yields ‘natural’ clusters consistent with the characteristics and usage of IP addresses. The test results are verified using nslookup. About 90% of the clusters were correctly identified by the algorithm.

Important Links:

Go Back