Time series event correlation with DTW and Hierarchical Clustering methods

Unaffiliated, Bengaluru, India
Akamai Technologies, Bengaluru, India
Network Planning, Akamai Technologies, Bengaluru, India
DOI
10.7287/peerj.preprints.27959v1
Subject Areas
Algorithms and Analysis of Algorithms, Artificial Intelligence, Data Mining and Machine Learning, Data Science
Keywords
Correlation, Causation, Clustering techniques, Nearest Neighbor, Hierarchical Clustering, Two Sample Tests, Dynamic Time Warping, Event correlation, Time Series
Copyright
© 2019 Mishra et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Mishra S, Shafi Z, Pathak S. 2019. Time series event correlation with DTW and Hierarchical Clustering methods. PeerJ Preprints 7:e27959v1

Abstract

Data driven decision making is becoming increasingly an important aspect for successful business execution. More and more organizations are moving towards taking informed decisions based on the data that they are generating. Most of this data are in temporal format - time series data. Effective analysis across time series data sets, in an efficient and quick manner is a challenge. The most interesting and valuable part of such analysis is to generate insights on correlation and causation across multiple time series data sets. This paper looks at methods that can be used to analyze such data sets and gain useful insights from it, primarily in the form of correlation and causation analysis. This paper focuses on two methods for doing so, Two Sample Test with Dynamic Time Warping and Hierarchical Clustering and looks at how the results returned from both can be used to gain a better understanding of the data. Moreover, the methods used are meant to work with any data set, regardless of the subject domain and idiosyncrasies of the data set, primarily, a data agnostic approach.

Author Comment

Current version is the first explanation of the applications of the methods. The application can be further enhanced and performance factors can be fine-tuned in future versions along with new type of data sets. The current version primarily uses datasets from servers, web applications and similar types. I intend to apply this implementation in a datasource agnostic manner in further datasets. Thereby, I may be updating my findings in further versions of this paper.