Streaming stochastic variational Bayes: An improved approach for inference with concept drifting data streams

CTO Office, WSO2, Colombo 03, Sri Lanka
DOI
10.7287/peerj.preprints.27790v2
Subject Areas
Artificial Intelligence, Data Mining and Machine Learning
Keywords
Online Learning, Variational Inference, Black-box Inference, Probabilistic Models, Bayesian Learning, Classificaiton, Regression, Concept Drifts, Data Streams, Posterior
Copyright
© 2019 Jihan et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Jihan N, Jayasinghe M, Perera S. 2019. Streaming stochastic variational Bayes: An improved approach for inference with concept drifting data streams. PeerJ Preprints 7:e27790v2

Abstract

Online learning is an essential tool for predictive analysis based on continuous, endless data streams. Adopting Bayesian inference for online settings allows hierarchical modeling while representing the uncertainty of model parameters. Existing online inference techniques are motivated by either the traditional Bayesian updating or the stochastic optimizations. However, traditional Bayesian updating suffers from overconfident posteriors, where posterior variance becomes too inadequate to adapt to new changes to the posterior with concept drifting data streams. On the other hand, stochastic optimization of variational objective demands exhausting additional analysis to optimize a hyperparameter that controls the posterior variance. In this paper, we present "Streaming Stochastic Variational Bayes" (SSVB) — a novel online approximation inference framework for data streaming to address the aforementioned shortcomings of the current state-of-the-art. SSVB adjusts its posterior variance duly without any user-specified hyperparameters to control the posterior variance while efficiently accommodating the drifting patterns to the posteriors. Moreover, SSVB can be easily adopted by practitioners for a wide range of models (i.e. simple regression models to complex hierarchical models) with little additional analysis. We demonstrate the superior performance of SSVB against Population Variational Inference (PVI), Stochastic Variational Inference (SVI) and Black-box Streaming Variational Bayes (BB-SVB) using two non-conjugate probabilistic models: multinomial logistic regression and linear mixed effect model. Furthermore, we also emphasize the significant accuracy gain with SSVB based inference against conventional online learning models for each task.

Author Comment

In this version, we have substantially improved the paper, adding additional analysis to evaluate the proposed approach compared to the traditional Bayesian updating.

Moreover, we have improved the related work section and evaluation section.

Furthermore, we have reviewed the content and reduced the number of pages to improve the readability.

Now the appendices are provided with the supplemental files.

Supplemental Information

Appendices to the paper that present additional proofs, model specifications and experiments

DOI: 10.7287/peerj.preprints.27790v2/supp-1