Novel trajectory clustering method based on distance dependent Chinese restaurant process
 Published
 Accepted
 Received
 Academic Editor
 Pablo Arbelaez
 Subject Areas
 Artificial Intelligence, Computer Vision, Visual Analytics
 Keywords
 Path modelling, Trajectory clustering, Anomaly detection, Chinese restaurant process, Distance dependent CRP
 Copyright
 © 2019 Arfa et al.
 Licence
 This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
 Cite this article
 2019. Novel trajectory clustering method based on distance dependent Chinese restaurant process. PeerJ Computer Science 5:e206 https://doi.org/10.7717/peerjcs.206
Abstract
Trajectory clustering and path modelling are two core tasks in intelligent transport systems with a wide range of applications, from modeling drivers’ behavior to traffic monitoring of road intersections. Traditional trajectory analysis considers them as separate tasks, where the system first clusters the trajectories into a known number of clusters and then the path taken in each cluster is modelled. However, such a hierarchy does not allow the knowledge of the path model to be used to improve the performance of trajectory clustering. Based on the distance dependent Chinese restaurant process (DDCRP), a trajectory analysis system that simultaneously performs trajectory clustering and path modelling was proposed. Unlike most traditional approaches where the number of clusters should be known, the proposed method decides the number of clusters automatically. The proposed algorithm was tested on two publicly available trajectory datasets, and the experimental results recorded better performance and considerable improvement in both datasets for the task of trajectory clustering compared to traditional approaches. The study proved that the proposed method is an appropriate candidate to be used for trajectory clustering and path modelling.
Introduction
The trajectory of a moving object obtained by tracking the object’s position from one frame to the next is a simple yet efficient descriptor of an object’s motion. Trajectory analysis has long been a research focus in different fields of study (Jonsen, Myers & Flemming, 2003; Pao et al., 2012; Reed et al., 1999; Fox, Sudderth & Willsky, 2007). In the context of intelligent surveillance systems (ITS) (Tian et al., 2017), trajectory clustering is a critical core technology in many surveillance applications including activity analysis (Morris & Trivedi, 2011), path modelling (Zhang, Lu & Li, 2009), anomaly detection (Dee & Velastin, 2008), and road intersection traffic monitoring (Aköz & Karsligil, 2014).
Many trajectory analysis systems consist of two main steps. In the first step, trajectories are grouped into clusters based on their similarities. Most proposed methods assume the number of clusters to be known. After the trajectories are clustered, the path taken by agents in each cluster will be modelled. There are at least two limitations with these approaches. First, in realworld problems, the number of clusters is usually unknown or is expensive to acquire. Furthermore, trajectory clusters and path models are closely related, whereby the knowledge of one helps in improving the performance of the other.
Most existing trajectory analysis methods can be categorized into similaritybased models and Probabilistic Topic Models (PTM). The main stages of similaritybased approaches are calculating a similarity matrix and clustering the trajectories based on the similarity matrix. At the first stage, pairwise similarities between trajectories are obtained via a similarity function and stored into a N × N matrix, where N is the total number of available trajectories. Defining a suitable similarity measure is a challenging task that directly affects the overall accuracy of the system (Zhang, Kaiqi & Tieniu, 2006). Wellknown similarity measures used for trajectory analysis include Euclidean distance, dynamic time wrapping (DTW) (Keogh & Pazzani, 2000), Hausdorff distance (Atev, Miller & Papanikolopoulos, 2010), and Longest Common SubSequences (LCSS) (Vlachos, Kollios & Gunopulos, 2002). After the similarity matrix is obtained, the second stage uses any standard clustering algorithm to cluster the trajectories into K clusters based on their similarities. Typical clustering algorithms include spectral clustering (Ng, Jordan & Weiss, 2002), agglomerative clustering (Xi, Weiming & Wei, 2006), and fuzzy cmeans (Weiming et al., 2006). The main disadvantage of similaritybased approaches is that it requires the number of clusters, K, to be known in advance.
When trajectories are clustered, some studies perform path modelling in a further stage. Path models are useful in intelligent surveillance systems and used for compact representation of clusters, performing realtime anomaly detection (Morris & Trivedi, 2011), and highlevel scene understanding (Lei et al., 2014), and route planning (Joseph et al., 2011). Makris & Ellis (2005) modelled the path as an envelope, which denotes the extent of a path by finding the two farthest samples in a cluster. Morris & Trivedi (2011) used the weighted average of trajectories of each cluster to form the path model for that cluster. Based on the dominant set clustering approach, Yiwen et al. (2014) proposed a system that obtains the scene structure from clustered trajectories.
All these approaches, however, model the path after the trajectories are clustered. Therefore, the performance of the modelled path is limited to how well trajectories are clustered. Also, the modelled path is not used to improve the trajectory clustering.
Another wellknown class of approaches in trajectory analysis is based on probabilistic topic model (PTM) (Papadopoulos, 2008). In PTM approaches, trajectories are first converted into a set of symbols via a predefined codebook. This new representation of trajectories is then treated as documents while the symbols are treated as words. Compared to a similaritybased approach, trajectory analysis methods based on PTM do not usually require the number of clusters in advance.
Jeong, Chang & Choi (2011) used latent Dirichlet allocation (LDA) and the hidden Markov model (HMM) to discover the semantic regions and the temporal relationship between them. A twolevel LDA topic model is proposed by Song et al. (Lei et al., 2014). The first level LDA models the motion of singleagent as distributions over patchbased features. The second level LDA uses the output of the firstlevel to learn interactions over multiagents. This model, however, does not perform trajectory clustering.
Wang et al. (2011) proposed a dual hierarchical Dirichlet process (DualHDP). Unlike previous PTM models, DualHDP is capable of clustering the trajectories and modelling the semantic scene at the same time. Each semantic region is modelled as a distribution over grids, and the scene is modelled as a distribution over the semantic regions. The number of clusters and the semantic scene is decided automatically. Since the model relies only on bagofgrids representation, it cannot capture the longterm dependency between observations. This results in having a partial path model for each cluster. Having a full path model is an important step for interpreting agents’ movement in scenarios such as highways and junctions.
Furthermore, since only quantised trajectories are used, the overall performance of DualHDP is highly sensitive to grid size. Choosing a large grid size rapidly decreases the performance due to quantisation error. On the other hand, choosing a small grid size requires considerably more amount of data to learn the trajectory patterns.
This study proposed a trajectory clustering and path modelling system that clusters the trajectories and models the path taken by each cluster at the same time. Our approach is based on distant dependent Chinese restaurant process (DDCRP) (Blei & Frazier, 2011), which is a generalisation of the normal Chinese restaurant process (CRP) (Pitman, 2002).
Methods
Distance dependence chinese restaurant process
The Chinese restaurant process (CRP) is a distribution on partitions of integers proposed by Pitman (2002). CRP can be explained by the following analogy: Imagine a Chinese restaurant with an infinite number of tables. The first customer enters the restaurant and sits at the first table with probability1. Next, customers enter the restaurant and sit at occupied tables with probability proportional to the number of customers sitting on that table or sit at an empty table with the probability relative to a parameter α. After this process, which is known as a customertable assignment, customers sitting on the same table will share a similar dish. This process can be described as follows: (1)$P\left({z}_{i}=k{z}_{i},\alpha \right)\propto \left\{\begin{array}{c}{n}_{k},k\le K\phantom{\rule{10.00002pt}{0ex}}\hfill \\ \alpha ,k=K+1\phantom{\rule{10.00002pt}{0ex}}\hfill \end{array}\right.$
where z_{i} denotes table assignment for the ith customer, K is the total number of occupied tables, and z_{−i} is table assignmthe ent of all other customers except ith customer, and n_{k} is the total number of customers sitting on the ith table. More details of CRP and its connection to Dirichlet process can be found in Gershman & Blei (2012).
The distance dependence Chinese restaurant process (DDCRP) generalises the CRP and allows for a nonexchangeable distribution over partitions (Blei & Frazier, 2011). Unlike CRP, where each customer is assigned to a table, in DDCRP each customer is assigned to another customer with a probability relative to their distance/similarity. Therefore, the more similar two customers, the more probable they will get a direct link. It is important to note that it is still possible for two customers with small similarities to be indirectly linked to each other via intermediate customers. After this procedure, which is also known as a customer to customer assignment, customers who are directly or indirectly linked will sit down at a table and share a similar dish.
More formally, let d_{ij} represent the distance between ith and jth customers. Probability of customer i have a direct link with customer j is calculated as: (2)$P\left({c}_{i}=j\mathit{D},f,\tau \right)\propto \left\{\begin{array}{c}\tau ,\phantom{\rule{30.00005pt}{0ex}}if\phantom{\rule{10.00002pt}{0ex}}i=j\phantom{\rule{10.00002pt}{0ex}}\hfill \\ f\left({d}_{ij}\right),\phantom{\rule{20.00003pt}{0ex}}otherwise\phantom{\rule{10.00002pt}{0ex}}\hfill \end{array}\right.$
where f(d) denotes a monolithically decreasing decaying function that satisfies $f\left(\infty \right)=0$, D is the matrix of pairwise distance between customers, and τ is a constant that indicates the probability of selflink.
The DDCRP was proposed originally for modelling nonexchangeable text documents where the distance between the dates of documents determines their similarity. The documents are converted into their bagofwords (BoW) representation before the posterior probability of DDCRP is calculated. Such a conversion to BoW representation is a crucial step that makes the inference of DDCRP computationally tractable.
Recently researchers have adopted DDCRP for problems beyond language processing. Ghosh et al. (2011) proposed a hierarchical extension of DDCRP for producing coarser image segmentations in the form of humanlike segmentations. In a more recent study, Baldassano, Beck & Li (2015) used DDCRP to model a complex web of connections with a small number of interacting units. The proposed method is used to model the connectivity between subregions of the human brain and analysing human migration behaviour. Also, Ren et al. (2016) used DDCRP for key frame selection from unordered image sets, where the selected frames are used for dense 3D reconstruction.
Trajectory analysis with distance dependent CRP
Unlike text data where observations in documents are words sampled from a corpus with a limited number of words, observations in trajectories are not discrete. Trajectories are vectors with varying length where each observation gets a real value bounded by the scene’s size. One can divide the scene into blocks of equal sizes and convert a trajectory into its discrete form. After such a conversion, the resulting quantized trajectories are equal length vectors and each observation gets a discrete value. The size of grids in this case, however, will have a direct impact on the system performance. While theoretically smaller grids can improve the performance, they require substantially more data for training.
Another disadvantage of treating trajectories as documents is the bagofwords representation. Such representation discards the order between observations. Discarding the orders between samples in trajectory data is problematic since it is possible for agents from opposite directions to share the same observations over grids. One solution to avoid this problem is to quantise the direction of observations (Wang et al., 2011). Estimating the direction of observation requires further processing and sometimes includes an inaccurate estimation. Such a quantisation increases the size of the corpus and, therefore, requires more data for training. In addition, with bagofword representation alone longterm dependencies between observation cannot be captured which results in having partial path models in existing PTM approaches.
We addressed these problems by using similarity between trajectories as the prior probability in DDCRP. Using such a prior probability limits the assignment of trajectories and promotes trajectories to get linked based on how similar two trajectories are. In addition to the similarity measure, whether the trajectories are linked together or not, also will depend on their discrete observation over the grids. Since most similarity measures can be applied prior to converting the trajectory into discrete form, such a formulation is less sensitive to the choice of grid size. In addition, since some similarity measures, including Modified Hausdorff and LCSS, also take the order of the observations into account, it is not required to quantise the direction anymore.
Any raw trajectory T_{i}, is usually represented by a sequence of its n_{i} observation T_{i} = [o_{i,1}, ..., o_{i,l}, ..., o_{i,ni}]. In this representation, o_{i,l} indicates lth observed position of ith object. Let d_{ij} to indicate pairwise distance between ith and jth trajectories. This distance can be of any general distance used to measure similarity between trajectories. The result of pairwise distance between N trajectories can be stored in a distance matrix and denoted as D ∈ ℜ^{N×N}.
Apart from the calculation of distance matrix discussed above, raw trajectories are converted into bagofgrids representation. For this, the scene is divided into M grid cells of equal size. Based on the cell in which it falls into each observation of a trajectory o_{i,l}, is individually quantised. Then a raw trajectory, T_{i}, is approaximated by bagofgrid represetnation X_{i} ∈ ℜ^{M}. Each element of X_{i}(s) indicates the number of times ith trajectory had an observation in the sth grid cell.
Using DDCRP’s metaphor, we use the bagofgrid representation of trajectories as customers, clusters as the tables and path models as dishes. Based on the definition of DDCRP, it is not possible to draw the table directly. Instead, the outgoing link for each customer needs to be drawn. Trajectories that directly or indirectly link together are considered to be in the same cluster. All trajectories in the similar cluster share the same path model which is a multinomial distribution over the grid cells. Each path model is independently drawn from a base distribution G_{0}. In our case, G_{0} is a Dirichlet distribution. The full generative process for the news program is as follows:

For each trajectory, sample customer assignment C_{i} ∼ ddCRP(D, f, τ) as explained in Eq. (2).

Drive table assignment from customer assignment. For each table, k, sample its parameter from the base distribution φ_{k} ∼ G_{0}

For each trajectory, independently draw X_{i} ∼ Mul(.φ_{zi})
The decaying function, f(.), in Eq. (2) was defined as: (3)$f\left(d;\gamma ;{\gamma}_{0}\right)=exp\left(\frac{d}{\gamma}\right).$
With this function, the probability of linking two trajectories becomes smaller as their distance increases. The parameter γ controls how fast this probability decays with increasing distance. The inference of DDCRP requires drawing samples for all samples which have the possibility of being linked.
Inference
The key problem that needs to be addressed is computing the posterior distribution of latent customer assignment conditioned on the bagofgrid cell representation of trajectories X_{1:N}. In our problem, the based distribution G_{0}, is conjugate to the data generating distribution P(X_{i}Z_{ci}, G_{0}). Therefore, the cluster parameters φ_{k} can be analytically marginalised. After such a calculation, the posterior distribution is expressed by: (4)$P\left({c}_{1:N}{\mathit{X}}_{1:N},\mathit{D},f,\tau ,\gamma ,{\gamma}_{0}\right)\propto \prod _{i=1}^{N}P\left({c}_{i}\mathit{D},f,\tau ,\gamma ,{\gamma}_{0}\right)P\left({\mathit{X}}_{1:N}Z\left({c}_{1:N}\right)\right)$
where Z(c_{1:N}) denotes the table assignment and P(X_{1:N}Z(c_{1:N})) is the likelihood function which can be expressed by Blei & Frazier (2011) (5)$P\left({\mathit{X}}_{1:N}Z\left({c}_{1:N}\right)\right)=\prod _{k=1}^{\leftZ\left({c}_{1:N}\right)\right}P\left({\mathit{X}}_{{z}^{k}\left({C}_{1:N}\right)}Z\left({c}_{1:N}\right)\right)$
with $\leftZ\left({c}_{1:N}\right)\right$ being the number of unique tables and ${z}^{k}\left({C}_{1:N}\right)$ denoting all customers assigned to table k.
Due to the combinatorial sum in the denominator, the analytical solution of the posterior given by Eq. (4) is intractable. Instead of exact inference, collapsed Gibbs strategy (Blei & Frazier, 2011) is used to derive the posterior inference where the customer assignment is iteratively sampled from the following equation: (6)$P\left({c}_{i}{c}_{i},{\mathit{X}}_{1:N},\mathit{D},f,\tau ,\gamma ,{\gamma}_{0}\right)\propto P\left({c}_{i}\mathit{D},f,\tau \right)\times P\left({\mathit{X}}_{1:N}z\left({c}_{i}\cup {c}_{i}\right)\right)$
where c_{−i} denotes all customer assignments except for c_{i}. The first term on the right side of the equation is DDCRP’s customer assignment discussed in Eq. (2), and the second term is the likelihood term given by Eq. (5). More details can be found in the Supplemental Material.
Results and Discussion
The performance of the proposed approach was evaluated on the CROSS (Morris & Trivedi, 2011) and the Lankershim datasets (NGSIM: Next Generation Simulation, 2008).
The CROSS dataset provides objects trajectories and their ground truth activities. The data are organized into train and test sets. There are 1,900 and 9,700 trajectories in the train and test sets respectively. Two hundred samples in the test set are labeled as abnormal activities. These samples were discarded in this study and we evaluated the proposed model on 9,500 trajectories in the test set with legal activities (Fig. 1).
The Lankershim dataset is part of the Next Generation Simulation (NGSIM) program provided by the US Federal Highway Administration (FHWA). The dataset contains videos taken with overhead intersection cameras. The dataset also provided the trajectories of moving vehicles. Based on the time the videos are collected, the data are placed into 8:30 am to 8:45 am and 8:45 am to 9:00 am subsets. The trajectories took place near an intersection, and trajectories outside of this area were removed (see Fig. 2). The corresponding X and Y coordinate for this region were −80 < X < 80 and 300 < Y < 500 respectively. After filtering the trajectories having less than ten observations, a total of 2212 trajectories were obtained. Since this dataset does not provide activity labels for trajectories, the trajectories were manually labelled into 21 activities (19 legal activities, and two activities where agents took illegal maneuvers).
The main parameter that needs to be set prior to experiments is the size of the grid cells. Theoretically, smaller grid cells produce a better result with the cost of requiring more data. Based on the performed experiments, the cell size was set for the CROSS to 40 × 25 and for the Lankershim into 10 × 10 pixels. These choices of cell size divide the CROSS and Lankershim into 9 by 19 and 16 by 20 equal sized grid cells respectively. Each raw trajectory was converted into bagofgrid representation mentioned in the section of Trajectory Analysis with Distance Dependent CRP. The dimensions of bagofgrids representations are X_{i} ∈ ℜ^{1×171} and X_{i} ∈ ℜ^{1×320} for CROSS and Lankershim datasets respectively.
The correct clustering rate (CCR) is used to evaluate the clustering performance. The CCR has been used as evaluation criteria to verify trajectory clustering algorithms in several studies (Morris & Trivedi, 2009; Weiming et al., 2013; Zhang, Kaiqi & Tieniu, 2006). Given the ground truth set G and resulting clusters set C, corresponding cluster that maximizes the number of matched labels is found. The CCR is defined as (7)$CCR=\frac{1}{N}\sum _{i=1}^{K}{p}_{i}$
where N is the number of trajectories, K is the number of clusters in the ground truth. Given the assignment between ground truth and estimated cluster labels, p_{i} is computed as (Zhang, Kaiqi & Tieniu, 2006): (8)${p}_{i}=\left\{\begin{array}{c}\left{c}_{i}\cap {g}_{m}\right;\phantom{\rule{10.00002pt}{0ex}}given\phantom{\rule{1.99997pt}{0ex}}{c}_{i}\in C\phantom{\rule{1.99997pt}{0ex}}assigned\phantom{\rule{1em}{0ex}}to\phantom{\rule{1.99997pt}{0ex}}{g}_{m}\in G\phantom{\rule{10.00002pt}{0ex}}\hfill \\ 0;\phantom{\rule{30.00005pt}{0ex}}otherwise\phantom{\rule{10.00002pt}{0ex}}\hfill \end{array}\right.$
The proposed method was compared with dualHDP and three wellknown distance measure methods, LCSS, DTW, and modified Hausdorff (MH). For each distance, four unsupervised clustering algorithms were used: Kmean clustering, spectral clustering, agglomerative clustering, and graphbased clustering. The average CCR of clustering algorithms for each distance method is reported in this study. One limitation of distancebased clustering techniques is that they require the number of clusters to be given to them.
To show the effect of choosing the number of clustering on the performance the experiments were run with the different number of clusters, including the true value. The other parameters of competitor methods were set during the course of experiments to achieve their maximum accuracy. For the proposed methods, collapsed Gibbs was performed for 100 samples. After each sampling, CCR was evaluated based on the customer assignment result. Figure 3 shows CCR per sample for the Lankershim and CROSS datasets. In all methods, CCR achieves greater than 0.9 after the 3rd sample. The average CCR is obtained by averaging the CCR values after neglecting the first ten samples.
The results of trajectory clustering accuracy for the CROSS dataset are summarized in Table 1. The best correct clustering rate is obtained by DDCRP when using LCSS as a distance measure which produces 0.993. The average correct clustering rate of LCSS with traditional clustering algorithm is 0.986. While this value is slightly less than the performance produced by LCSS and DDCRP, it needs to be highlighted that traditional clustering techniques achieved 0.986 correct clustering rate with the assumption of knowing the true total number of clusters. Also, the proposed method improves the correct clustering rate regardless of which similarity method is used. In other words, using DTW and MH as similarity measure along with DDCRP achieve better average CCR compared to traditional clustering algorithms.
Number of Clusters  5  10  15  19  20  21  25  30 

DTW  0.292  0.559  0.806  0.971  0.984  0.968  0.916  0.857 
LCSS  0.291  0.555  0.805  0.986  0.971  0.952  0.864  0.792 
MH  0.556  0.559  0.807  0.986  0.986  0.973  0.934  0.879 
Dual HDP  –  –  –  –  –  0.801  –  – 
DDCRP (DTW)  –  –  –  0.986  –  –  –  – 
DDCRP (LCSS)  –  –  –  0.993  –  –  –  – 
DDCRP (MH)  –  –  –  0.989  –  –  –  – 
Similarly, Table 2 summarizes the clustering accuracy for the Lankershim dataset. Using DDCRP along with MH distance produces the best correct clustering rate of 0.998. Same as CROSS dataset, the proposed method improves correct clustering rate regardless of which similarity measure is used. The most notable improvement is when DTW is used as a similarity measure. In this case, the average CCR for similaritybased clustering is 0.868 while the combination of DTW and DDCRP results in the CCR of 0.996.
Number of Clusters  5  10  15  18  19  20  25  30 

DTW  0.453  0.705  0.901  0.864  0.868  0.864  0.828  0.789 
LCSS  0.529  0.846  0.901  0.924  0.925  0.931  0.912  0.899 
MH  0.488  0.840  0.973  0.985  0.977  0.974  0.937  0.902 
Dual HDP  –  –  –  0.974  –  –  –  – 
DDCRP (DTW)  –  –  –  –  0.996  –  –  – 
DDCRP (LCSS)  –  –  –  –  0.996  –  –  – 
DDCRP (MH)  –  –  –  –  0.998  –  –  – 
After removing clusters with single trajectory and ignoring the initial samples, methods based on DDCRP discovered 19 clusters for both the CROSS and the Lankershim datasets. Figure 4 shows the discovered clusters in the 100th sample for the CROSS and Lankershim datasets. The results shown in this figure are obtained by DDCRP using LCSS and MH distances for the CROSS and Lankershim respectively. The discovered clusters are typical activities in an intersection and include crossing the intersection, turning left, turning right, and uturn.
As discussed in the Trajectory Analysis with Distance Dependent CRP section, the size of the grid impacts the accuracy of any PTMbased trajectory analysis system. Another advantage of the proposed method compared to the DualHDP method is that it is less sensitive to the choice of grid size. This is due to the fact that most PTM models, including dual HDP, are based only on bagofgrids representation of the trajectories. The proposed method, however, uses both bagofgrids and pairwise distance between raw trajectories. Therefore, it can be expected that the proposed method is less sensitive to the choice of grid sizes.
Figure 5 shows the average the CCR of the DDCRP and dualHDP systems for different sizes of the grid. The grid size of 25 × 24 and 10 × 10 pixels produces 0.801 and 0.974 correct clustering rate for the dualHDP method in the CROSS and Lankershim datasets respectively. However, the accuracy substantially decreases by increasing or decreasing the grid size. The proposed method, however, is more robust to the choice of grid size since the pairwise distance between trajectories is independent of the choice of grid size.
The aim of trajectory path modelling is to discover the paths commonly taken by objects in each cluster. One benefit of our method is its ability to model the path simultaneous to trajectory clustering. In our study, each path is characterized by the distribution over grid cells in a scene. Each cell for a path can be associated to any number in the range of 0 to 1, where 0 are the cells that have no chance of being observed in that path. As the values of a cell are closer to 1, this cell become more essential for the path, and the probability of it being passed by trajectories belonging to that path increases.
The path modelling experiments were conducted with the same parameter setup discussed earlier in this section. Figure 6 shows the cluster models for the CROSS and Lankershim datasets. The blue cells are less likely to be observed by trajectories in that cluster. Conversely, the red cells are more probably observed by trajectories. Then most paths have their probable grid cells in the middle of their route, while when moving further away to the edges of the routes, the probability of grid cells decreases.
Conclusion
This paper proposed an unsupervised approach for trajectory clustering and modelling. The generative process of trajectory analysis was modelled via a probabilistic model. The pairwise distances were used as prior in DDCRP to promoting similar trajectories to be clustered. The DDCRP were used to combine the advantages of similaritybased and PTMbased approaches. Compared to probabilistic topic approaches, our method is able to model the full path taken by agents in each cluster. Unlike most similaritybased methods, our method drives the number of clusters automatically. The proposed trajectory analysis system clusters the trajectories and models the clusters’ paths at the same time. Specifically, raw trajectories were converted to bagofgrid cells representation and considered each cluster with its distribution over the grids. Experimental results confirmed the effectiveness and usefulness of the proposed algorithm in trajectory clustering and modelling compared to other methods. The proposed approach is planned to have an online learning capability, where the cluster and path models keep updated as more data is observed.