Time series event correlation with DTW and Hierarchical Clustering methods
- Published
- Accepted
- Subject Areas
- Algorithms and Analysis of Algorithms, Artificial Intelligence, Data Mining and Machine Learning, Data Science
- Keywords
- Correlation, Causation, Clustering techniques, Nearest Neighbor, Hierarchical Clustering, Two Sample Tests, Dynamic Time Warping, Event correlation, Time Series
- Copyright
- © 2019 Mishra et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2019. Time series event correlation with DTW and Hierarchical Clustering methods. PeerJ Preprints 7:e27959v1 https://doi.org/10.7287/peerj.preprints.27959v1
Abstract
Data driven decision making is becoming increasingly an important aspect for successful business execution. More and more organizations are moving towards taking informed decisions based on the data that they are generating. Most of this data are in temporal format - time series data. Effective analysis across time series data sets, in an efficient and quick manner is a challenge. The most interesting and valuable part of such analysis is to generate insights on correlation and causation across multiple time series data sets. This paper looks at methods that can be used to analyze such data sets and gain useful insights from it, primarily in the form of correlation and causation analysis. This paper focuses on two methods for doing so, Two Sample Test with Dynamic Time Warping and Hierarchical Clustering and looks at how the results returned from both can be used to gain a better understanding of the data. Moreover, the methods used are meant to work with any data set, regardless of the subject domain and idiosyncrasies of the data set, primarily, a data agnostic approach.
Author Comment
Current version is the first explanation of the applications of the methods. The application can be further enhanced and performance factors can be fine-tuned in future versions along with new type of data sets. The current version primarily uses datasets from servers, web applications and similar types. I intend to apply this implementation in a datasource agnostic manner in further datasets. Thereby, I may be updating my findings in further versions of this paper.