A prediction and imputation method for marine animal movement data
 Published
 Accepted
 Received
 Academic Editor
 Qichun Zhang
 Subject Areas
 Artificial Intelligence, Social Computing, Spatial and Geographic Information Systems
 Keywords
 Marine animal movement, Trajectory analysis, Prediction, Imputation
 Copyright
 © 2021 Li et al.
 Licence
 This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
 Cite this article
 2021. A prediction and imputation method for marine animal movement data. PeerJ Computer Science 7:e656 https://doi.org/10.7717/peerjcs.656
Abstract
Data prediction and imputation are important parts of marine animal movement trajectory analysis as they can help researchers understand animal movement patterns and address missing data issues. Compared with traditional methods, deep learning methods can usually provide enhanced pattern extraction capabilities, but their applications in marine data analysis are still limited. In this research, we propose a composite deep learning model to improve the accuracy of marine animal trajectory prediction and imputation. The model extracts patterns from the trajectories with an encoder network and reconstructs the trajectories using these patterns with a decoder network. We use attention mechanisms to highlight certain extracted patterns as well for the decoder. We also feed these patterns into a second decoder for prediction and imputation. Therefore, our approach is a coupling of unsupervised learning with the encoder and the first decoder and supervised learning with the encoder and the second decoder. Experimental results demonstrate that our approach can reduce errors by at least 10% on average comparing with other methods.
Introduction
With the advancement in tracking devices, vast amounts of trajectory data have been collected. As a consequence, research in trajectory data prediction, clustering, and imputation is proliferating. The latest developments in position tracking and data analysis techniques have dramatically changed the way researchers study wildlife movements. Interdisciplinary collaborations have led to the development of new quantitative methods and tools that have become key to animal movement research and allow for enhanced and extensive interpretation of the results (Jonsen, Flemming & Myers, 2005; Johnson et al., 2008; MA et al., 2020). Because animals obtain resources such as prey and mates through movements, their movement patterns can contain essential biological information. Thus, researchers analyzing animal data obtained from remote sensing technology can help them determine places that animals like, understand their migration strategies, and enhance the effectiveness of protecting endangered species (Calenge, Dray & RoyerCarenzi, 2009).
Recent research has shown that marine animals vary significantly in their movement patterns in response to various physical and biological factors. For example, by investigating a multiyear database of female southern elephant seal motion behaviors, some studies have shown that the preference of female seals for middle scale ocean circulation is seasonally flexible (Cotté et al., 2015). Statistical data analysis has also revealed a link between elephant seal behavior and ocean patterns and suggested that prereproductive female southern elephant seals prefer to forage near mesoscale fronts (Campagna et al., 2006). From these examples, we can realize that a time varying trajectory analysis model is crucial because it can reveal unknown information from ecological data and provide models for observations. One simple way to achieve this is to allow the model output depending on the input values from previous inputs, and some deep learning approaches can be used.
Deep learning methods have been successfully used in many applications. In image classification and object detection, methods based on deep convolutional neural networks can achieve excellent results (Perez & Wang, 2017; Zhao et al., 2019). In time series analysis, methods based on recurrent neural networks perform well (Connor, Martin & Atlas, 1994). Researchers have also found that recurrent neural networks have an advantage over feedforward neural networks over time series and get better results on electric load forecasting (Connor, Martin & Atlas, 1994). To extract patterns in an unsupervised way, researchers have proposed autoencoders to reconstruct input data and to learn patterns simultaneously (Vincent et al., 2008).
However, most trajectory analysis research using deep learning tools usually focuses on human trajectories (Ma et al., 2019; Rudenko et al., 2020), which are quite regular on a daily basis. As marine animal trajectories can have very different patterns, many existing approaches are not applicable. In this work, we propose to model marine animal trajectories based on encoding and decoding modules for prediction and imputation. Our contributions are as follows:
First, we propose a deep learningbased approach for marine animal trajectory data analysis, specifically, prediction and imputation within the same framework.
Second, we design a learning model integrating recurrent neural networks and autoencoder networks along with attention modules to model marine animal trajectory data with better accuracy.
Third, our model utilizes hidden patterns of trajectories from encoders to improve prediction and imputation accuracy.
The remaining parts are organized as follows. In ‘Related works’, we state the interaction between trajectory and environment and the superiority of recurrent neural networks in dealing with time series problems. In ‘Method’, we described our model in detail and explained how the data is transformed in our model. In ‘Experiments’, we compare our model with other algorithms and preprocess the data in two different ways to demonstrate our method’s performance and efficiency. We conclude this work in ‘Conclusion’.
Related Works
Animal trajectories are generally affected by animal behaviors as well as situational and environmental factors. Therefore, it is not suitable to describe these trajectories with specific distributions, and flexible nonlinear models are more preferable to identify underlying patterns.
Many machine learning methods have been used to analyze movement data for cows (Martiskainen et al., 2009), cheetahs (Grünewälder et al., 2012), penguins (Carroll et al., 2014), etc. For example, random forest is widely used for movement data prediction or imputation (Zhang et al., 2020; Lin et al., 2017; He et al., 2019). Statespace models (Breed et al., 2012), hidden Markov models (Michelot, Langrock & Patterson, 2016), and Gaussian mixture models have also been used extensively in identifying and modeling telemetry data (Gibb et al., 2017; Jonsen et al., 2018; Langrock et al., 2012). Across many of these cases, particular patterns have usually been manually extracted from the data to simplify the predictive task (Jonsen et al., 2018).
Artificial neural networks are another kind of feasible methods. Such models have been used to estimate the movement probability of elks by considering the physical spatial structure of landscapes and animal memory of previously visited locations (Dalziel, Morales & Fryxell, 2008). Artificial neural networks can also identify and predict diving activities of seabirds (Browning et al., 2018). If inputs are sequences, a special type of artificial neural networks, recurrent neural networks can be used as they can learn the implicit temporal dependencies in sequential or spatial–temporal data. They have shown obvious advantages in dealing with problems such as time series prediction (Connor & Atlas, 1991), speech recognition (Graves, Mohamed & Hinton, 2013), subtitle generation (Song et al., 2019), image or video classification (Yang, Krompass & Tresp, 2017), handwriting sequences (Graves, 2013). Recurrent neural networks can also predict image sequences, and it performs well in action recognition when combining with autoencoders (Srivastava, Mansimov & Salakhudinov, 2015). They can also be used for machine translation when using twoway recurrent neural networks (Cho et al., 2014; Graves & Jaitly, 2014). Some studies have also used recurrent neural networks with random forest interpolation for pattern refinement to improve the prediction performance of recurrent neural networks (Rew et al., 2019).
To further improve the prediction and imputation performance, in this work, we propose to use an encoder and one decoder for trajectory embedding and use the other decoder for trajectory prediction and imputation. Experimental results justify the effectiveness.
Method
Movement analysis framework
Autoencoders are usually used for unsupervised learning, which requires unlabeled data only. In this work, we propose a novel framework that integrates autoencoders, recurrent neural networks, and attention modules, to improve the prediction and imputation performance for marine animal trajectories. The proposed framework differs from traditional approaches as it has an attention module for the encoder output, and it has two decoders for two purposes, as shown in Fig. 1. The first decoder can reconstruct input data and learn patterns through the reconstruction process, while the second one can perform trajectory prediction and imputation from learned patterns.
LSTM encoder
Longshorttermmemory (LSTM) network is a kind of recurrent neural network. LSTM network is suitable for processing and predicting events for relatively long intervals in time series. At the same time, in terms of performance, LSTM networks are usually superior to ordinary recurrent neural networks (Gers, Schmidhuber & Cummins, 1999). Here, we briefly describe the basic building block, an LSTM cell (Graves, 2013). An LSTM cell differs from a typical recurrent neural cell in that it controls the flow of information through input gates, forget gates, and output gates.
In this part, we use a T × F matrix x to represent an input trajectory with T time steps and F features. We use a row vector x_{t} represent the trajectory features at time step t. Similarly, we use h, f, i, and o to represent the hidden states, forget states, input states, and output states respectively, and we use subscript t to represent these values at time step t. In an LSTM cell for time step t, h_{t−1} and x_{t} can be used to calculate forget state f_{t}, input state i_{t}, output state o_{t}, and candidate cell state ${\tilde{C}}_{t}$, as represented from Eq. (1) to Eq. (4). In these equations, W_{f}, W_{i}, W_{o}, and W_{C} are weight matrices. Function σ is a softmax activation function and tahn is a tahn activation function. Then, we combine the previous cell state C_{t−1} and the candidate cell state ${\tilde{C}}_{t}$ weighted by forget state and input state respectively, as shown in Eq. (5). Hidden state h_{t} is updated with output state c_{t} and current cell state C_{t} as shown in Eq. (6).
(1)${f}_{t}=\sigma \left({W}_{f}\cdot \left[{h}_{t1},{x}_{t}\right]+{b}_{f}\right),$ (2)${i}_{t}=\sigma \left({W}_{i}\cdot \left[{h}_{t1},{x}_{t}\right]+{b}_{i}\right),$ (3)${o}_{t}=\sigma \left({W}_{o}\cdot \left[{h}_{t1},{x}_{t}\right]+{b}_{o}\right),$ (4)${\tilde{C}}_{t}=tanh\left({W}_{C}\cdot \left[{h}_{t1},{x}_{t}\right]+{b}_{C}\right),$ (5)${C}_{t}={f}_{t}\ast {C}_{t1}+{i}_{t}\ast {\tilde{C}}_{t},$ (6)${h}_{t}={o}_{t}\ast tanh\left({C}_{t}\right).$ We use T LSTM cells to form an encoder layer, and the kth layer is represented as Le^{(k)}(⋅). The input of the first layer is x, and the input of each other layers are the output of previous layers. The output of each layer are the hidden states of LSTM cells in the corresponding layer. Thus, the encoder can be written as follows: (7)$he{n}^{\left(k\right)}=\left\{\begin{array}{cc}L{e}^{\left(k\right)}\left(x\right),\phantom{\rule{10.00002pt}{0ex}}\hfill & k=1\hfill \\ L{e}^{\left(k\right)}\left(he{n}^{\left(k1\right)}\right),\phantom{\rule{10.00002pt}{0ex}}\hfill & k>1\hfill \end{array}\right.$ where hen^{(k)} represents the hidden states of LSTM cells corresponding to the kth layer. If the hidden state of each LSTM cell is of size M, hen^{(k)} is of size T × M. We define K as the total number of encoding layers.
Attention module
In this part, we integrate the encoder output with an attention module (Luong, Pham & Manning, 2015; Yang et al., 2016) so that the decoders can focus on important hidden patterns.
To build the module, we first perform a fully connected transformation for the encoder output and get a transformed state matrix $\stackrel{\u0304}{h}$: (8)$\stackrel{\u0304}{h}=FCN\left(he{n}^{\left(K\right)}\right),$ where FCN(he) represents the fully connected layer over hidden states, and the transformed state matrix $\stackrel{\u0304}{h}$ is a column vector of length T × M. We use hl to represent the last row of hen^{(K)}. We obtain the attention score using $score\left(hl,{\stackrel{\u0304}{h}}_{t:}\right)$ which is simply a dot product of two vectors. After normalization with a softmax function, we can obtain attention weight vector aw of length T, in which each element is defined as follows: (9)$a{w}_{t}=\frac{exp\left(score\left(hl,{\stackrel{\u0304}{h}}_{t:}\right)\right)}{\sum _{{t}^{\prime}}exp\left(score\left(hl,{\stackrel{\u0304}{h}}_{{t}^{\prime}:}\right)\right)},$ where ${\stackrel{\u0304}{h}}_{t:}$ is tth row of transformed state matrix $\stackrel{\u0304}{h}$.
Finally, we multiply the attention weight aw with $\stackrel{\u0304}{h}$ to obtain the attention vector: (10)$av=a{w}^{T}\stackrel{\u0304}{h},$ where av is a vector of length M.
The attention vector av is concatenated with hl, and fed into another fully connected layer to produce the final hidden pattern: (11)$ha=FCN\left(\left[av,hl\right]\right),$ where ha is the attention output of length 2M.
LSTM decoders for trajectory reconstruction and prediction/imputation
A traditional autoencoder model can be used for unsupervised learning and identify hidden patterns for trajectory series. In this work, we use a dualdecoder model to make it possible for supervised learning while utilizing the hidden patterns.
We use Ld^{(k,1)} and Ld^{(k,2)} to represent the kth LSTM layers of the first and the second decoders, respectively. Corresponding to the encoder in Eq. (7), the structure of the two decoders are as follows: (12)$h{d}^{\left(k,1\right)}=\left\{\begin{array}{cc}L{d}^{\left(k,1\right)}\left(ha\right),\phantom{\rule{10.00002pt}{0ex}}\hfill & k=1\hfill \\ L{d}^{\left(k,1\right)}\left(h{d}^{\left(k,1\right)}\right),\phantom{\rule{10.00002pt}{0ex}}\hfill & k>1,\hfill \end{array}\right.,$ (13)$h{d}^{\left(k,2\right)}=\left\{\begin{array}{cc}L{d}^{\left(k,2\right)}\left(ha\right),\phantom{\rule{10.00002pt}{0ex}}\hfill & k=1\hfill \\ L{d}^{\left(k,2\right)}\left(h{d}^{\left(k,2\right)}\right),\phantom{\rule{10.00002pt}{0ex}}\hfill & k>1,\hfill \end{array}\right.,$ where hd^{(k,1)} represents the hidden states of LSTM cells corresponding to the kth layer of the first decoder, and hd^{(k,2)} represents that of the second decoder. The first decoder is used for reconstruction as usual so that it can help encoder to extract meaningful patterns from trajectories. Based on these patterns, the second decoder is for supervised learning, namely, predicting or imputation for the model input.
We use the outputs of the last layers of two decoders to compute the model outputs, and thus, if there are K decoder layers, we have (14)$\widehat{x}=FCN\left(h{d}^{\left(K,2\right)}\right),$ (15)$\widehat{y}=FCN\left(h{d}^{\left(K,2\right)}\right).$ where $\widehat{x}$ is the reconstruction for the input data, and $\widehat{y}$ is the prediction or imputation result.
Loss function
We choose the mean square error to construct the loss function for the whole framework. The loss function can compute the reconstruction error and the prediction or imputation error. In the specific task of our trajectory analysis, if y is the target label for input sequence x, with reconstruction sequence $\widehat{x}$ and the prediction or imputation output $\widehat{y}$, the objective of this model is to minimize the loss function: (16)$minL=\sum _{j=1}^{n}\left[{\left({x}^{\left(j\right)}{\widehat{x}}^{\left(j\right)}\right)}^{2}+{\left({y}^{\left(j\right)}{\widehat{y}}^{\left(j\right)}\right)}^{2}\right],$ where n is the number of trajectory segments and j represents the jth segment for input.
To train the model, we need to minimize the loss. Adam optimizer (Kingma & Ba, 2014) is widely used for many deep learning models, so we also use it to minimize the loss function.
Experiments
Dataset
We use a data set that includes trajectories of 489,391 h from 111 southern elephant seals and their positions obtained from Argos platform transmitter terminals. All procedures to obtain the data were approved by the respective ethics committees and licensing bodies including, the Australian Antarctic Animal Ethics Committee (ASAC 2265, AAS 2794, AAS 4329), the Tasmanian Parks and Wildlife Service, the University of California, Santa Cruz, and the Programa Antártico Brasileiro. This procedure is carried out in accordance with current guidelines and regulations.
Data preprocessing
Our method can take position information, including longitudes and latitudes, into account obtained from animal trajectories. However, although the data set is quite large, animals usually appear at different positions. Figure 2 shows such scenarios with four Antarctic elephant seals.
To solve this issue, we feed our algorithm with distances and angles information extracted from trajectories for ease of learning. We use P_{t} to denote the position in longitude and latitude at time t. We use d_{t} to denote the distance traveled during the period t between two data collections. We also use θ_{t} to indicate the direction of movement. Therefore, with longitude and latitude information, d_{t} represents the greatcircle distance between P_{t} and P_{t+1} calculated by haversine equation, and θ_{t} represents the azimuth angle of the direction from P_{t−1}P_{t} to P_{t}P_{t+1}. The input x of our model includes following features (d_{t}cosθ_{t}, d_{t}sinθ_{t}, θ_{t}), and the output of our model is (d_{t}cosθ_{t}, d_{t}sinθ_{t}).
We also slice the trajectory data into segments with a sliding window. Each segment has a certain number of consecutive data points. The number of data points in each segment would vary depending on the experiment.
Experiment design
In our experiments, we consider three cases to prepare the training and testing data:

One seal: in this case, each experiment is carried out within one seal’s data. We use half of the trajectory data for training and the other half for testing. The first half of a seal trajectory is used as a training set, and the second half is as a testing set.

Five seals: in this case, each experiment is carried out with four seals for training and one seal for testing. Testing seals are not included in the training set.

All seals: in this case, we first extract trajectory segments of all the seals and then randomly shuffle these segments. In the experiment, we use the first half of the shuffled segments for training and the other half for testing.
To evaluate the efficiency of our model with and without attention (LSTMAEATDD and LSTMAEDD), we choose three other methods for comparison. These models have also been widely used in trajectory prediction and imputation tasks. The first one is a widely used but simple LSTM model having one hidden layer of one hundred neurons for analyzing sequence data.
The second method is a densely connected artificial neural network (ANN), in which there is a hidden layer with one hundred neurons. The third one is a random forest method with two hundred decision trees. It is an ensemble method that proved to be effective for time series regression. For simplicity, we choose the singlelayer encoder and decoders in our approach.
For evaluation, we select two metrics, Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), for the model output d_{t}cosθ_{t} and d_{t}sinθ_{t} when comparing with the groundtruth.
Data prediction
In this part, we consider the application of data prediction. Given an input trajectory, our model generates location information for time steps following the input sequence. We evaluate the impact of differences in segment length for training and testing. The notations are shown as in Table 1. For example, T_{7}P_{1} means that we use the first seven time steps of a segment as input, and the model produces results for the eighth time step. We also compare our approach with other methods to evaluate its performance. For evaluation, we also select MAE and RMSE for the model output d_{t}cosθ_{t} and d_{t}sinθ_{t} when comparing with the groundtruth.
Subcases  Length for input  Length for output 

T_{7}P_{1}  7  1 
T_{7}P_{4}  7  4 
T_{12}P_{7}  12  7 
Case 1: One Seal
In this case, for each experiment, we use trajectory segments from one seal for training and testing. We use 80% of the data for training and the remaining 20% for testing. We carry out one experiment for each seal and then calculate the average performance for all the experiments. Results are shown in Tables 2 and 3. Comparing with other methods in Table 2, the average MAE of LSTMAEATDD is 19.47% less than that of LSTM, 71.81% less than that of ANN, and 51.49% less than that of Random forests. From Table 3, we can find that the average RMSE of LSTMAEATDD is 22.57% less than that of LSTM, 62.87% less than that of ANN, and 46.40% less than that of Random forests. These results demonstrate the effectiveness of our model. Example predicting results by our approach are shown in Fig. 3.
LSTMAEATDD  LSTMAEDD  LSTM  ANN  Random forests  

T_{7}P_{1}  237.319  250.440  269.904  1163.260  583.062 
T_{7}P_{4}  323.735  349.914  375.148  1162.241  657.521 
T_{12}P_{7}  420.102  462.545  473.786  1152.049  780.859 
LSTMAEATDD  LSTMAEDD  LSTM  ANN  Random forests  

T_{7}P_{1}  387.995  408.143  484.409  1448.455  860.921 
T_{7}P_{4}  503.205  806.414  701.441  1514.741  937.436 
T_{12}P_{7}  657.439  753.896  813.486  1504.378  1089.780 
Case 2: Five Seals
In this case, seals data are randomly divided into multiple groups, with each group includes trajectory segments from five seals. We use one group of seals for each experiment and choose segments from four seals in the group as training data and segments from the other seal in the group as testing data. We carry out experiments for all the groups and calculated the average performance. Results are shown in Tables 4 and 5. Comparing with other methods in Table 4, the average MAE of LSTMAEATDD is 11.88% less than that of LSTM, 75.90% less than that of ANN, and 23.31% less than that of Random forests. From Table 5, we can find that the average RMSE of LSTMAEATDD is 22.13% less than that of LSTM, 72.22% less than that of ANN, and 20.54% less than that of Random forests. These results demonstrate the effectiveness of our model. Example segments are shown in Fig. 4.
LSTMAEATDD  LSTMAEDD  LSTM  ANN  Random forests  

T_{7}P_{1}  168.918  187.926  198.6611  946.406  239.127 
T_{7}P_{4}  249.005  270.423  277.894  1021.269  316.594 
T_{12}P_{7}  310.115  343.232  350.289  1050.041  393.695 
LSTMAEATDD  LSTMAEDD  LSTM  ANN  Random forests  

T_{7}P_{1}  277.966  348.854  347.808  1375.048  389.258 
T_{7}P_{4}  408.962  460.440  548.086  1458.451  495.996 
T_{12}P_{7}  522.188  605.666  655.394  1511.277  635.0766 
LSTMAEATDD  LSTMAEDD  LSTM  ANN  Random forests  

T_{7}P_{1}  1362.880  1510.908  1627.762  1973.107  1823.014 
T_{7}P_{4}  1350.552  1552.378  1587.598  1957.022  1846.123 
T_{12}P_{7}  1544.437  1562.605  1569.930  1947.542  1860.650 
LSTMAEATDD  LSTMAEDD  LSTM  ANN  Random forests  

T_{7}P_{1}  1910.567  2175.505  2254.435  2411.910  2289.266 
T_{7}P_{4}  1802.803  2061.795  2124.295  2434.975  2311.551 
T_{12}P_{7}  2039.647  2075.911  2083.032  2398.189  2355.768 
Case 3: All Seals
In this case, we use all the segments from all the seals in the experiment. We randomly choose half of the segments for training and the other half for testing. Results are shown in Tables 6 and 7. Comparing with other methods in Table 6, the average MAE of LSTMAEATDD is 11.02% less than that of LSTM, 27.58% less than that of ANN, and 23.02% less than that of Random forests. From Table 7, we can find that the average RMSE of LSTMAEATDD is 10.99% less than that of LSTM, 20.60% less than that of ANN, and 17.31% less than that of Random forests. These results demonstrate the effectiveness of our model. Example segments are shown in Fig. 5.
Subcases  Length for input  Length for output 

T_{1}P_{1}  1  1 
T_{7}P_{7}  7  7 
T_{14}P_{14}  14  14 
LSTMAEATDD  LSTMAEDD  LSTM  ANN  Random forests  

T_{1}P_{1}  281.664  208.968  240.295  814.662  505.864 
T_{7}P_{7}  236.992  236.997  257.502  997.410  706.942 
T_{14}P_{14}  243.321  286.144  301.394  946.278  879.690 
LSTMAEATDD  LSTMAEDD  LSTM  ANN  Random forests  

T_{1}P_{1}  507.709  389.9149  441.229  1136.288  799.169 
T_{7}P_{7}  405.793  420.320  457.572  1415.251  1001.373 
T_{14}P_{14}  402.579  499.870  537.589  1286.640  1177.041 
Data imputation
In this part, we consider the application of data imputation. It is to generate missing data points for given sequences. We carry out a comprehensive evaluation with three different cases. The output segments are the same as the input length as the item in Table 8. For example, notation T_{7}P_{7} means that we use a total of fourteen time steps, with seven steps corresponding to time {1, 3, 5, 7, 9, 11, 13} for input, and the other seven steps corresponding to time {2, 4, 6, 8, 10, 12, 14} as output. We also compare our approach with other methods to evaluate its performance. For evaluation, we also select MAE and RMSE for the model output d_{t}cosθ_{t} and d_{t}sinθ_{t} when comparing with the groundtruth.
Case 1: One Seal
In this case, for each experiment, we use one seal for training and testing. The length of the sequence is set to be 1, 7 and 14 respectively. For one seal, we use 80% of the segments for training and the remaining 20% for testing. We carry out such experiments for all the seals and calculated the average performance. Results are shown in Tables 9 and 10. Comparing with other methods, both of our approaches are effective, but LSTMAEATDD is not as good as LSTMAEDD for T_{1}P_{1}, which is reasonable because the input segment with length one is too short for attention mechanism to work. From Table 9, we can find that the average MAE of LSTMAEATDD is 8.52% less than that of LSTM, 73.52% less than that of ANN, and 65.07% less than that of Random forests. From Table 10, we can find that the average RMSE of LSTMAEATDD is 8.85% less than that of LSTM, 65.91% less than that of ANN, and 56.06% less than that of Random forests. Example imputation results are shown in Fig. 6.
Case 2: Five Seals
In this case, seals data are randomly divided into multiple groups, with each group includes trajectory segments from five seals. We use one group of seals for each experiment and choose segments from four seals in the group as training data and segments from the other seal in the group as testing data. We carry out experiments on all the groups and calculated the average performance. Results are shown in Tables 11 and 12. Comparisons with other methods prove the effectiveness of our approach, and similar as before, LSTMAEATDD is not as good as LSTMAEDD for T_{1}P_{1} because the input segment with length one is too short for attention mechanism to work. From Table 11, we can find that the average MAE of LSTMAEATDD is 40.33% less than that of LSTM, 75.34% less than that of ANN, and 31.49% less than that of Random forests. From Table 12, we can find that the average RMSE of LSTMAEATDD is 25.89% less than that of LSTM, 70.49% less than that of ANN, and 24.77% less than that of Random forests. Example segments are shown in Fig. 7.
LSTMAEATDD  LSTMAEDD  LSTM  ANN  Random forests  

T_{1}P_{1}  190.748  149.283  151.353  918.250  260.823 
T_{7}P_{7}  159.890  176.549  192.877  934.221  271.680 
T_{14}P_{14}  345.679  450.622  820.294  962.841  482.425 
LSTMAEATDD  LSTMAEDD  LSTM  ANN  Random forests  

T_{1}P_{1}  366.756  291.972  300.275  1385.303  476.676 
T_{7}P_{7}  308.363  336.048  365.393  1403.790  482.591 
T_{14}P_{14}  571.478  897.523  1015.749  1431.606  697.706 
Case 3: All Seals
In this case, we use all the segments from all the seals in the experiment. We randomly choose half of the segments for training and the other half for testing. Results are shown in Tables 13 and 14. Comparisons with other methods prove the effectiveness of our approach, especially imputation for long sequences. In this experiment, LSTMAEDD is always slightly better than LSTMAEATDD, probably because behaviors of seals may diverge, making it difficult for the attention mechanism to catch patterns of all the seals properly. From Table 13, we can find that the average MAE of LSTMAEATDD is 47.70% less than that of LSTM, 79.06% less than that of ANN, and 78.55% less than that of Random forests. From Table 14, we can find that the average RMSE of LSTMAEATDD is 28.03% less than that of LSTM, 52.78% less than that of ANN, and 52.68% less than that of Random forests. Example segments are shown in Fig. 8.
LSTMAEATDD  LSTMAEDD  LSTM  ANN  Random forests  

T_{1}P_{1}  203.770  200.627  201.826  903.516  894.184 
T_{7}P_{7}  209.046  200.736  207.016  984.989  941.229 
T_{14}P_{14}  206.405  204.781  747.974  997.401  981.847 
LSTMAEATDD  LSTMAEDD  LSTM  ANN  Random forests  

T_{1}P_{1}  758.635  736.740  751.158  1503.748  1500.653 
T_{7}P_{7}  752.292  731.636  774.091  1581.619  1579.469 
T_{14}P_{14}  739.197  718.632  1511.687  1543.229  1538.450 
Conclusions
Trajectory prediction and imputation are essential in analyzing trajectory data. In this work, we propose an approach utilizing autoencoders and attention modules to extract important hidden patterns and then use an additional decoder for estimation. This approach can overcome the drawback raised with pure prediction or imputation networks. The proposed attention module for the hidden patterns can further select critical patterns for decoders, and thus, it improves prediction and imputation results. In the experiments, our model performs better than others, which proves the effectiveness of our approach. This method can meet a wide range of applications for biologists and ecologists.