# Closed-Form Estimation of Multiple Change-Point Models

- Published
- Accepted

- Subject Areas
- Statistics
- Keywords
- change-point analysis, Bayesian statistics, time series analysis, marginal likelihood, model selection

- Copyright
- © 2013 Jensen
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

- Cite this article
- 2013. Closed-Form Estimation of Multiple Change-Point Models. PeerJ PrePrints 1:e90v3 https://doi.org/10.7287/peerj.preprints.90v3

## Abstract

Identifying discontinuities (or *change-points*) in otherwise stationary time series is a powerful analytic tool. This paper outlines a general strategy for identifying an unknown number of change-points using elementary principles of Bayesian statistics. Using a strategy of *binary partitioning by marginal likelihood*, a time series is recursively subdivided on the basis of whether adding divisions (and thus increasing model complexity) yields a justified improvement in the marginal model likelihood. When this approach is combined with the use of conjugate priors, it yields the *Conjugate Partitioned Recursion* (CPR) algorithm, which identifies change-points without computationally intensive numerical integration. Using the CPR algorithm, methods are described for specifying change-point models drawn from a host of familiar distributions, both discrete (binomial, geometric, Poisson) and continuous (exponential, Gaussian, uniform, and multiple linear regression), as well as multivariate distribution (multinomial, multivariate normal, and multivariate linear regression). Methods by which the CPR algorithm could be extended or modified are discussed, and several detailed applications to data published in psychology and biomedical engineering are described.

## Author Comment

See also the supplemental material, for further analytic examples, a sensitivity analysis of the described algorithm, implementation of two operational modifications, further mathematical support, and a Matlab package that includes all data presented in the manuscript and implements the algorithm as a function.

This version includes a number of small changes throughout the manuscript.

## Supplemental Information

### Example: Curvilinear vs. Linear Change-Point Analysis of Reaction Times

Reaction times from a single subject learning a psychophysical task, originally reported by Palmeri (1997). The dashed line corresponds to a four-parameter “learning curve,” reported by Heathcote et al. (2000), while the solid lines interpret the same data as approximately linear, with two change-points.

### Prior vs. Posterior Probability Densities Given Different Hyperparameters

Prior hyperparameters *α*_{0} and *α*_{1} (left) and posterior hyperparameters *α*_{0}′ and *α*_{1}′ for the beta distribution given three different prior distributions and the same empirical observations. See Equation 2 and the Supplement for details.

### Maximum Likelihood vs. Marginal Model Likelihood

Likelihood as a function of the probability θ for the binomial distribution, given 20 observations, consisting of 10 success and 10 failures. Darker areas indicate higher relative likelihoods. (Left Panel) When a single parameter is used to describe the data, the maximum likelihood estimator is *θ*_{hat} = ^{10}/_{20} = 0.5. (Center Panel) The first ten observations and the second ten observations are examined independently, and assigned their own MLE parameters *θ*_{hat_1} = ^{4}/_{10} = 0.4 and *θ*_{hat_2} = ^{6}/_{10} = 0.6. This increase in model complexity improves the maximum likelihood, but also results in a smaller (and therefore less favorable) marginal model likelihood than that observed for the one-parameter model. (Right Panel) If splitting the observations into two groups instead reveals that *θ*_{hat_1} = ^{2}/_{10} = 0.2 and *θ*_{hat_2} = ^{8}/_{10} = 0.8, this results in greater maximum and marginal model likelihoods. All estimations in this figure assume the prior hyperparameters *α*_{0} = *α*_{1} = 1. On the basis of these results, choosing a two-parameter model is justified in the scenario depicted in the right panel, but not for the center panel, despite both having higher maximum likelihoods than the one-parameter model in the left panel.

### Uniform vs. Non-Uniform Event Times

An example of uniform vs. non-uniform segmentation. (Left) In the uniform case, each segment of time is an equally strong candidate for a putative change-point. In this example, the nine segments each have a common weight of ^{1}/_{9} . (Right) In the non-uniform case, a change-point is presumed to be located somewhere between *t*_{(1)} and *t*_{(10)}, and if any point along that interval is as likely as any other, then longer segments are correspondingly more likely to contain change-points.

### Binomial Data, Uncorrected vs. Corrected With Respect To Small Sample Bias

Values of log(*k*_{(c)}·*T*_{(c)} given 100 binary observations [0,1,0,1,··· ,0,1] both uncorrected (left) and corrected (right). The dashed line represents the boundary between evidence favoring a change at time (positive) vs. evidence against (negative) at event *c*.

### Gaussian Data, Uncorrected vs. Corrected With Respect To Small Sample Bias

Values of log (*k*_{(c)} · *T*_{(c)} ) given 1000 randomized Gaussian observations (*μ* = 0, *σ* = 1), rep- resenting data in the range ±z = 3.3 both uncorrected (left) and corrected (right). The dashed line represents the boundary between evidence favoring a change at time (positive) vs. evidence against (negative) at event *c*.

### Applying the CPR Algorithm To A String of Binary Data

The CPR algorithm applied to a binary string. (Top) The individual Bayes factors k(c) (here presented on a log scale) are calculated for each point, and there is a clear peak when *c* = 8. (Bottom) The algorithm is then recursively applied to the two segment halves, with weak support found when *c* = 34. Because this support is too weak to satisfy the decision criterion, it is not selected. The dashed line represents the boundary between evidence favoring a change at time (positive) vs. evidence against (negative) at event c; however, note that a change-point is only added to the model if the posterior odds ^{p1′}/_{p0'} for a segment favors a change, which is not the case for either segment in the bottom panel.

### Initial Learning For 25 Simultaneous Chains

Plot of item acquisition in 25 different lists in Jensen et al. (2013) , learned using the Simultaneous Chain procedure. (Left) Estimated time of acquisition for each list item is indicated both by elevation and by shade of gray. (Right) Histogram showing frequency of inter-acquisition periods, in trials, for each item, as well as the number of trials overall needed to acquire all list items. Bars do not sum to 100%, because acquisition did not occur in all lists.

### Reaction Times As A Function Of Trials And Task Difficulty

Reaction times as a function of trials and task difficulty in Palmeri (1997). (Top) Each of the 6132 reaction times, color-coded according to their speed and positioned with respect to trial and stimulus numerosity. (Bottom) The model fit resulting from a multiple regression, subdivided according to the CPR algorithm.

### Stimulus-Specific Reaction Times as a Function of Trials

Reaction times as a function of trials in Palmeri (1997) for specific stimuli in a single participant. Lines represent Bayesian regression estimates whose subdivisions were determined using to the change-point algorithm. (Left) 6-dot stimulus learning for most rapidly acquired stimulus (blue) and the least rapid (red). (Right) 11-dot stimulus learning for most rapidly acquired stimulus (blue) and the least rapid (red).

### 3D Position Data, Assessed With Respect To A Multivariate Normal Distribution

3D position data collected by Kaluža et al. (2010). Individual points are multivariate, such that an observation an *x*-coordinate (in the top panel), a *y*-coordinate (in the center panel), and a *z*-coordinate (in the bottom panel). Change-points were identified using the CPR algorithm assuming a multivariate normal distribution.

### 3D Movement Data Visualized With Respect To Time

3D depiction of the a subset of the movement data from Kaluža et al. (2010) and its associated change-point model. Individual points represent discrete observations and are color-coded continuously with respect to time, as noted on the color bar. The points on the color bar denote the times at which change-points were detected by the CPR algorithm. Ellipses represent the means and covariance associated with particular segments of data, estimated post-hoc using the robust method described by Campbell (1980). The thin colored line, which is drawn along the horizontal plane, represents a moving average of 50 points.

### Detailed View of 3D Movement Data, Given a Multivariate Normal Model

Detailed plot of the data presented in Figure 11, represented in terms of recorded sensor values alone each axis. Gray zones are identified by Kaluža et al. as ‘transitional periods’ (such as falling or sitting down), while dashed lines indicated change-points detected by the CPR algorithm, given a multivariate normal model.

### Detailed View of 3D Movement Data, Given a Multiple Linear Regression Model

Detailed plot of the data presented in Figure 11, re-analyzed using a multiple linear regression model. The thin, pale lines represent the raw data, while the thick lines correspond to the best-fitting linear fits. Gray zones are identified by Kaluža et al. as ‘transitional periods’ (such as falling or sitting down), while dashed lines indicated change- points detected by the CPR algorithm, given a multiple regression model.

### Supplemental Material

A variety of non-psychological datasets are analyzed to demonstrate broad applications of the *Conjugate Partitioned Recursion* (CPR) algorithm. A method for performing a sensitivity analysis is demonstrated. A weakness in the algorithm is identified in cases of large-scale stationary distributions, and two modifications to the standard procedure are proposed to circumvent that weakness: The 'dicing' operation (whose function is strictly exploratory) and the 'forward-retrospective' algorithm, which makes assessments sequentially rather than recursively. Finally, the mathematical basis for the conjugate priors invoked in the main article and the formulas for empirical Bayes 'rule-of-thumb' priors are provided.

### Supplemental: British Coal-Mining Disaster Analysis

British coal-mining disasters between 1851 and 1962, represented as days between disasters, which are approximately exponentially distributed (left) and number of disasters per year, which are approximately Poisson distributed (right). In both cases, the CPR algorithm, using the appropriate rule-of-thumb empirical prior, finds a single change-point. Each plot shows its distribution’s probability function as a color map, based on the posterior parameter estimates.

### Supplemental: Well-Log Data

Nuclear-magnetic response of rock strata in an oil well bore-hole, reported by Ó Ruanaidh & Fitzgerald (1996). Points correspond to individual observations. The red line indicates the estimated mean (with one standard deviation shown in light red) within segments identified by the CPR algorithm.

### Supplemental: US Treasury Bill Data

Nominal rates on three-month U.S. Treasury bills, calculated monthly from June 1947 to February 2013. (Top) Nominal rates, with red lines displaying the model inferred by a linear regression change-point analysis. (Bottom) First differences of nominal rates (*x*_{i} − *x*_{i−1}), with the means (red) and one standard deviation (light red) from a Gaussian change-point analysis.

### Supplemental: Sensitivity Analysis with Respect to *τ*

Sensitivity analysis performed on two datasets from the main body text and three from the supplement. The decision criterion *τ *was varied between 1 and 100, and the SBIC was calculated for each resulting model, using Equation 3. Relative SBIC among the models is plotted in red (with the best model set at zero), and the number of data segments is plotted in blue.

### Supplemental: Simulated Data Showcasing A Failure Condition For The CPR Algorithm

Simulated data generated using a chaotic distribution of Gaussian parameters derived from Equation S4. (Left) The full dataset *D*, consisting of 5050 simulated observations that were generated in blocks of 50. The default CPR algorithm will not identify most of these change-points. (Right) A subset of *D*, with the true change-points denoted by dashed gray lines. The CPR algorithm will reliably identify these change-points if it is run only on this subset, despite failing to detect them when run on the complete dataset.

### Supplemental: Simulation Data Analyzed Using the 'Dicing' Operation and the Forward-Retrospective Strategy

Change-points identified in simulated data using the alternative strategies laid out for dealing with stationary data. (Top) Trials for which change- points were identified, as a function of the number of subdivisions examined by the dicing operation. (Center) A histogram of how often, across 30 values for the dicing parameter, a precise change-point was identified. (Bottom) Trials for which change-points were identified using the forward-retrospective algorithm.

### Matlab Implementation of the CPR Algorithm

Feature-rich Matlab function implementing the CPR algorithm for the ten conjugate priors described in the Supplement. The forward-retrospective variant described in the Supplement is implemented as a second function.

### Matlab Data and Instructional Vignettes

All data presented in the manuscript and supplement are included in Matlab format, as well as basic code for performing the analyses using the provided Matlab functions.