TwitterNews: Real time event detection from the Twitter data stream

Department of Computing, Macquarie University, Sydney, New South Wales, Australia
DOI
10.7287/peerj.preprints.2297v1
Subject Areas
Artificial Intelligence, Data Mining and Machine Learning, Social Computing, World Wide Web and Web Science
Keywords
Event detection, Twitter, Microblog, Incremental clustering, Locality sensitive hashing, Random indexing
Copyright
© 2016 Hasan et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Hasan M, Orgun MA, Schwitter R. 2016. TwitterNews: Real time event detection from the Twitter data stream. PeerJ Preprints 4:e2297v1

Abstract

Research in event detection from the Twitter streaming data has been gaining momentum in the last couple of years. Although such data is noisy and often contains misleading information, Twitter can be a rich source of information if harnessed properly. In this paper, we propose a scalable event detection system, TwitterNews, to detect and track newsworthy events in real time from Twitter. TwitterNews provides a novel approach, by combining random indexing based term vector model with locality sensitive hashing, that aids in performing incremental clustering of tweets related to various events within a fixed time. TwitterNews also incorporates an effective strategy to deal with the cluster fragmentation issue prevalent in incremental clustering. The set of candidate events generated by TwitterNews are then filtered, to report the newsworthy events along with an automatically selected representative tweet from each event cluster. Finally, we evaluate the effectiveness of TwitterNews, in terms of the recall and the precision, using a publicly available corpus.

Author Comment

This is a preprint submission to PeerJ Preprints.