Fishing operation type recognition based on multi-branch convolutional neural network using trajectory data

Bohui Jiang; Weifeng Zhou

doi:10.7717/peerj-cs.3020

Fishing operation type recognition based on multi-branch convolutional neural network using trajectory data

Bohui Jiang^1,2, Weifeng Zhou ¹

1East China Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Shanghai, China

2Shanghai Ocean University, Shanghai, China

DOI: 10.7717/peerj-cs.3020

Published: 2025-07-22
Accepted: 2025-06-19
Received: 2024-05-24

Academic Editor: Martina Iammarino

Subject Areas: Artificial Intelligence, Data Mining and Machine Learning, Spatial and Geographic Information Systems, Neural Networks
Keywords: Geohash, Vessel trajectory, Deep convolutional neural network, Deep learning, Spatiotemporal context, Fishing operation type, Embedding

Copyright: © 2025 Jiang and Zhou
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.

Cite this article: Jiang B, Zhou W. 2025. Fishing operation type recognition based on multi-branch convolutional neural network using trajectory data. PeerJ Computer Science 11:e3020 https://doi.org/10.7717/peerj-cs.3020

The authors have chosen to make the review history of this article public.

Abstract

Accurate identification of fishing vessel operations is vital for sustainable fishery management. Existing methods inadequately exploit spatiotemporal contextual information in vessel trajectories and fail to effectively fuse multimodal data. To address this, this study proposes a novel framework integrating Geohash-based geocoding with embedding techniques inspired by natural language processing to extract spatiotemporal features from trajectory sequences. We develop a multi-branch 1D convolutional neural network (MB-1dCNN) to minimize feature engineering dependency while enhancing operational-type recognition. Comparative experiments evaluate Geohash encoding lengths and network architectures (single-branch vs. multi-branch, fully-connected vs. 1D-CNN). Results indicate optimal Geohash encoding at length 5. The multi-branch structure significantly outperforms single-branch counterparts, and MB-1dCNN demonstrates superior performance over multi-branch model with fully connected layers (MB-FCNN), achieving additional gains in accuracy and F1-score. Key findings reveal: (1) 1D-CNN processing surpasses fully-connected networks in sequential feature extraction, (2) Multi-branch architectures enhance information fusion capabilities. The proposed MB-1dCNN establishes state-of-the-art performance for trajectory-based fishing operation recognition, offering valuable insights for spatial computing applications in maritime surveillance.

Introduction

As terrestrial aquatic resources are increasingly exploited, oceans, which cover over 70% of the Earth’s surface, are gaining attention. Consequently, the exploitation of marine resources has accelerated. Scientific progress has been made in fish stock planning and utilization (FAO, 2020; Xu et al., 2021). Various fishing methods impact marine fishery resources differently; particularly, excessive fishing activities severely damage specific marine populations (Rousseau et al., 2019). To conduct commercial activities, fishing vessel owners must apply for permits and register their operational types (i.e., fishing methods) with competent authorities. However, fishing vessels often operate illegally, violating local or international rules and harming fishery resources and the marine ecological environment (Kassawmar et al., 2018; Rockström, Barron & Fox, 2002). Traditionally, marine officers on patrol boats monitor fishing violations through boarding and inspecting fishing vessels. Yet, the large number of active fishing vessels, their unpredictable movements, and long operational periods make such monitoring challenging and expensive, requiring a substantial marine patrol workforce. Consequently, practical management of fishing vessel operations is limited. In this context, accurate identification of fishing operational types or activities using technology is critical for promoting better management of commercial fishing vessels and the sustainable development of fishery resources.

Vessel tracking provides location information, increasing transparency in fishing activities (Orofino et al., 2023). Vessel monitoring systems (VMSs) and automatic identification systems (AISs) have been applied in fishing vessel management (Mesquita et al., 2024; Iacarella et al., 2023). However, existing methods fail to fully utilize the spatiotemporal contextual information contained in the fishing-vessel trajectories, and poorly fuse multiple information of inputs for fishing-vessel operational-type identification. Considering the different ways in which input features are processed, the research motivation of the study is to reduce the dependence of previous methods on feature engineering and exploiting extra spatiotemporal contextual information to build a new and relatively good acceptable neural network model to do the recognition of fishing vessel operation-types. The approach proposed in this research utilizes the spatioltemporal contextual feature extraction of trajectory sequences. Additionally, it contributes to the extraction and fusion of multiple-input feature information, which may be advantageous for applications of spatial computation and analysis.

Related work

Fishing vessel trajectory data from VMSs or AISs, in combination with expert knowledge and machine learning techniques, can facilitate the identification of various fishing operational types or activities. VMSs record real-time data for fishing vessel trajectory points, including longitude, latitude, speed, heading, and timestamp. This enables the rapid acquisition of positional data over large areas (Walker & Bez, 2010; Vermard et al., 2010; Zhou et al., 2015), making VMSs crucial for monitoring and managing fishing efforts. The speed and fishing activity of a single vessel vary across different statuses, such as sailing and operation. Natale et al. (2015) and Zheng et al. (2016) conducted analyses of VMS trajectory data to identify patterns. Similarly, Gao et al. (2020) and Huang et al. (2019) extracted features from VMS trajectory data and employed classical machine learning algorithms—like XGBoost, support vector machines, random forests, and k-nearest neighbors—to recognize fishing activity patterns. Li et al. (2021b) applied a Gaussian mixture model to mine characteristic thresholds based on the statistical frequency distribution of fishing vessel operating speeds. Yang et al. (2022) constructed 17-dimensional vessel characteristics related to spatial, temporal, and behavioral parameters and used a three- layer bidirectional long short-term memory (BiLSTM) network to classify fishing and non-fishing vessels. Li et al. (2021a) extracted a 42-dimensional feature vector based on a LightGBM-BiLSTM model to effectively identify fishing activity patterns. Natale et al. (2015) developed a Gaussian mixture model to identify fishing behavior from AIS data. By setting speed distribution thresholds, they categorized trajectories into navigation, fishing, and anchoring behaviors, thereby distinguishing fishing from non-fishing states. Zhang et al. (2021) analyzed fishing vessel behavior using vessel speed, sailing time, trajectory, and fishing effort. This allowed for the direct identification of fish stock locations and the statistical analysis of fishing output and resource distribution. Ashrafi, Tessem & Enberg (2023) utilized AIS data along with several deep learning architectures to detect likely unreported fishing activities, with a particular focus on bottom trawlers.

All these approaches rely on statistical analyses of the fishing industry’s state to obtain parameters such as possible thresholds. Furthermore, feature engineering is performed to identify different fishing operational types. Over 10 features are constructed from trajectory data. Vermard et al. (2010) proposed a Bayesian hierarchical model based on hidden Markov processes. Using speed and movement transfer as parameters, this model predicts vessel states of different behaviors, including fishing, sailing, and docking. Feng et al. (2019) used VMS data and BP neural networks, with direction angle and speed changes as inputs, to effectively identify fishing behaviors. Gao et al. (2020) developed a feature fusion algorithm by extracting features from VMS data to determine the operation type.

However, these methods are highly dependent on thresholds or feature engineering. The accuracy of pattern identification is determined by reasonable threshold settings and valid feature extraction. Determining these thresholds or features requires a certain level of expert knowledge, as parameters vary with different fishing activity patterns. Consequently, these methods have limitations that hinder their generalized application.

Compared to classical machine learning, which requires the manual construction of multiple features, deep learning offers the advantage of end-to-end learning (LeCun, Bengio & Hinton, 2015). Deep-learning models can automatically learn features from datasets using network structures, thereby reducing reliance on expert knowledge (Janiesch, Zschech & Heinrich, 2021). As a result, deep learning is increasingly being applied to various tasks. Notably, Kim & Lee (2020) used convolutional neural network (CNN) models trained on vessel trajectory data to distinguish fishing from non-fishing vessels. Additionally, Tang et al. (2020) generated a track map of vessel trajectory data and classified gillnet and trawl-net types using the Visual Geometry Group (VGG)-16 model. All these methods demonstrated plausible performance in fishing-activity pattern identification.

A trajectory sequence consists of continuous track points in time and space. In addition to the spatial and temporal information of the track points, the trajectory sequence contains spatiotemporal contextual information (Garani & Adam, 2020). However, the methods mentioned above do not fully utilize this contextual information when classifying fishing activity patterns. If the geographic location of a track point can be converted into a literal representation, then the trajectory sequence can be expressed as a piece of text, with a certain point regarded as a word within the text. This allows for the extraction of contextual information to facilitate geographical analysis and achieve spatial “semantic” information. Consequently, techniques used to extract contextual information in natural language word processing can be applied to process trajectory sequences. Previously, Yin et al. (2019) used Universal Transverse Mercator coordinates to discretize the Earth’s surface. They then applied geographical units of different granularities to perform GPS encoding and finally trained a neural network to learn the spatial contextual information of global geographical coordinates. Mai et al. (2020) obtained the absolute locations and spatial relationships of encoded locations using the Space2Vec representation learning model. Tian et al. (2022) employed the GCN aided Location2Vec (GCN-L2V) model to learn the contextual semantic relationships between locations. Therefore, the representation of trajectory sequences can also be applied to the identification of fishing vessel operation types.

In summary, existing methods fail to fully utilize the spatiotemporal contextual information contained in fishing-vessel trajectories and exhibit poor fusion of multiple input information for fishing-vessel operational-type identification.

Materials and Methods

In this study, a new adaptive recognition method based on a convolutional neural network was designed employing 1D convolutions and a multi-branch structure. The proposed algorithm, called a multi-branch convolutional neural network (MB-1dCNN), realizes operational type recognition through the enhancement processing of fishing vessel trajectories. This method was applied in experiments to identify fishing-vessel activity patterns for three fishing-vessel operational types: purse seine, trawl net, and gillnet fishing. And in the comparative experiments, the spatial contextual information, the influence of Geohash encoding length and different convolutional neural network structures, such as fully connected network and 1D convolutions, as well as single-branch and multi-branch, on the recognition effect is explored. A single-branch neural network, primarily composed of fully connected layers (SB-FCNN), serves as a benchmark.

This section provides an overview of the Geohash algorithm, embedding, and fully connected neural networks (FCNN) and 1D-CNNs. The structure of single- and multi-branch are also explained, as these components form the basis of the proposed and comparative models explored in this study. In the subsequent section, we compare the results of the MB-1dCNN model with those of a single-branch model with fully connected layers (SB-FCNN), a multi-branch model with fully connected layers (MB-FCNN), and a single-branch structure with 1D convolutions (SB-1dCNN). All architectures can be found in Fig. 1. Figure 1D represents the model structure proposed and recommended in this article, while Figs. 1A, 1B, and 1C respectively represent the model structures of SB-FCNN, SB-1dCNN, and MB-FCNN. The constituent modules of the models are shown in Table 1. Additionally, the datasets and evaluation metrics used in this study are also presented in this section.

Figure 1: The architectures of different neural networks.
(A) SB-FCNN (B) SB-1dCNN (C) MB-FCNN (D) MB-1dCNN.

Download full-size image

DOI: 10.7717/peerj-cs.3020/fig-1

Table 1:

The constituent modules of the neural networks.

	SB-FCNN	MB-FCNN	SB-1dCNN	MB-1dCNN
Input	800	800	800	800
Embedding	20,000, 50	20,000, 50	20,000, 50	20,000, 50
GlobalAvgPool1D	√	√	×	×
Concatenate	4	×	4	×
Dense	128	128	×	×
Dense	64	64	×	×
Dense	32	32	×	×
Dense	16	16	×	×
Conv1D	×	×	4, 5	4, 5
MaxPooling1D	×	×	2	2
Conv1D	×	×	8, 5	8, 5
MaxPooling1D	×	×	2	2
Conv1D	×	×	16, 5	16, 5
MaxPooling1D	×	×	2	2
Conv1D	×	×	32, 5	32, 5
MaxPooling1D	×	×	2	2
Flatten	×	×	√	√
Concatenate	×	4	×	4
Dropout	0.5	0.5	0.5	0.5
Dense	8	8	128	128
Dropout	0.5	0.5	0.5	0.5
Output	3	3	3	3

DOI: 10.7717/peerj-cs.3020/table-1

The contributions of this study are summarized as follows:

(1)

The Geohash location encoding algorithm converts 2D geographic coordinates of the trajectory points into compact 1D alphanumeric strings, combined with the embedding technique to capture the spatiotemporal contextual information from trajectory sequences.
(2)

A multi-branch convolutional neural network architecture utilizing 1D convolutions for fishing vessel operational-type recognition.
(3)

Experimental validation demonstrating that Geohash encoding with length 5 achieves optimal recognition accuracy.
(4)

Empirical evidence showing that 1D convolutional architectures outperform fully connected networks, and multi-branch designs surpass single-branch counterparts in this task.

Geohash

The Geohash algorithm, which was proposed by Gustavo Niemeyer in 2008, is a geocoding method in the public domain (Yin & Chen, 2022; Zhou et al., 2021). What’s particularly elegant about this algorithm is that it follows the pattern of “left is 0, right is 1; bottom is 0, top is 1”, which is used to represent the area after subdivision, achieved by continuously and alternately dichotomizing the global longitude [−180°, 180°] [−180°, 180°] and latitude [−90°, 90°] ranges. In the Geohash approach, the corresponding binary value is recorded until the encoding length reaches the required accuracy. The longitude and latitude are then rearranged into a new binary string. Finally, the corresponding Geohash string is obtained using Base32, which encodes every five binary digits (Suwardi et al., 2015; Li et al., 2020). The alphabetically ordered Geohashes trace out a Z-order curve. As shown in Fig. 2, at the first level of Geohash encoding, different global ranges can be represented by separate 32 American Standard Code for Information Interchange (ASCII) characters. If the positions of the characters are connected with curves in the usual order, the curves appear Z-shaped. This is so called as a Z-order filling curve.

Thus, the Geohash algorithm converts the 2D longitude and latitude coordinates of a point into a 1D string to represent the corresponding geographical location (Zhou et al., 2022, 2020). The length of the Geohash string depends on the encoding accuracy; a longer string provides higher accuracy, and vice versa. A Geohash is generally encoded at Levels 1–12, corresponding to a grid range from 5,000 × 5,000 km² to 0.000000372 × 0.000000186 km² like Table 2. Each Geohash code represents a grid region on Earth and is the same for all coordinate points within that grid. As the Geohash string length increases, the extent of the grid it represents decreases (Irshaid et al., 2021; Jin et al., 2019; Guo et al., 2019).

Table 2:

Precision correspondence between the GeoHash encoding level and geospatial size.

Geohash level	Precision in latitude	Precision in longitude	Precision expressed in kilometers	Grid width in kilometers	Grid height in kilometers
1	±23	±23	±2,500	5,000	5,000
2	±2.8	±5.6	±630	1,250	625
3	±0.7	±0.7	±78	156	156
4	±0.087	±0.18	±20	39.1	19.5
5	±0.022	±0.022	±2.4	4.89	4.89
6	±0.0027	±0.0055	±0.61	1.22	0.61
7	±0.00068	±0.00068	±0.076	0.153	0.153
8	±0.000086	±0.000172	±0.01911	0.0382	0.0191
9	±0.000021	±0.000021	±0.00478	0.00477	0.00477
10	±0.00000268	±0.00000536	±0.0005971	0.00119	0.000596
11	±0.00000067	±0.00000067	±0.0001492	0.00000149	0.00000149
12	±0.00000008	±0.00000017	±0.0000186	0.000000372	0.000000186

DOI: 10.7717/peerj-cs.3020/table-2

Embedding

Embedding techniques were initially used in natural language processing tasks to convert string text into numeric vectors, primarily focusing on representation of words and documents (Mikolov et al., 2013; Zhao et al., 2020; Sonbol, Rebdawi & Ghneim, 2022). This method can be used to convert a word into a vector; that is, the text is converted into a fixed-length vector of consecutive real numbers. Words in a vocabulary are mapped to a potential vector space that summarizes their syntactic and semantic information. The basic assumption of the embedding technique is that words with the same meaning in a text should have similar representations and, thus, embedding vectors that are closer in the vector space are more similar in textual meaning. Therefore, the embedding representation can reveal the background information, that is the textual semantic information (Babić, Martinčić-Ipšić & Meštrović, 2020; Wang, Nulty & Lillis, 2021). Current implementations of embedding methods for natural language processing include Word2vec, a static embedding method based on local semantic information (Mikolov et al., 2013), and Embeddings from Language Models (ELMo), a method based on dynamic semantic information (Wang et al., 2020).

In this study, embedding is implemented using the TensorFlow embedding layer. We do not directly use functions from the Gensim library, which is a popularly used library for Word2vec algorithm’s implementations. The embedding layer in TensorFlow being used to process text sequences can be considered an implement of the Word2vec algorithm. An embedding layer is primarily used to encode a sentence to produce a distributed representation of a word or character. After a sentence passes through this layer, the word vector of each word or character can be obtained, where k denotes the dimensions of the word vector and $X_{i}$ denotes the i-th word. A sentence of length n can then be expressed as

(1) $\begin{matrix} X_{1 : n} = X_{1} \oplus X_{2} \dots \oplus X_{n} \end{matrix}$ where $\oplus$ denotes the concatenation operator and $X_{i : i + j}$ denotes the matrix of feature vectors formed by the word vectors $X_{i}$ , $X_{i + 1} \dots, X_{i : i + j}$ .

In the proposed model, the convolutional, pooling, and fully connected layers for the classification task are positioned behind the embedding layer. This design enables the embedding layer to process semantic information for weight adjustment considering both the contextual information of the word and the classification task, thereby improving the semantic information extraction performance. Then the embedding layer is realized using the function tf.keras.layers.Embedding in TensorFlow to establish the spatiotemporal contextual features of the trajectory segment by an end-to-end method. Table 1 lists the parameters used in this study.

Neural network model construction and comparison

Fully connected neural networks and one-dimensional convolutional neural networks

An FCNN is a classical artificial neural network structure that comprises multiple neurons connected according to a hierarchical structure (Liu et al., 2018). FCNNs typically comprise an input layer, several hidden layers, and an output layer. The input layer receives raw data as input, the hidden layers extract higher-order data features, and the output layer outputs the corresponding results based on the task requirements. Each hidden and output layer contains multiple neurons, and the number of neurons and layers in each hidden layer can be adjusted according to the task complexity and data characteristics. During the training process, the FCNN continuously adjusts the connection weights through a back-propagation algorithm and optimizer so that the network output is as close as possible to the real value and the loss function is minimized. By using a large number of training samples and iterative optimization, the FCNN can learn the complex patterns and laws of the supplied data. Further, FCNNs have strong generalization ability (Liu et al., 2019; Lu et al., 2021). An FCNN can be expressed as

(2) $\begin{matrix} \hat{y} = f (\sum w_{i} x_{i} + b) \end{matrix}$ where $x_{i}$ denotes the input, $w_{i}$ denotes the weight, b denotes the bias, f denotes the activation function, and $\hat{y}$ denotes the output.

A CNN is a model structure that gradually learns data features from low- to high-level patterns (Yamashita et al., 2018). The basic CNN structure comprises input, convolutional, pooling, fully connected, and output layers. The first stage involves the continuous extraction of features from the input data through alternations between the convolutional and pooling layers, as well as data dimensionality reduction through the use of a fully connected layer when approaching the output layer (Zhou, Jin & Dong, 2017). A one-dimensional convolution (1D-CNN) is typically used to process 1D sequential data.

In a 1D-CNN model structure, 1D data are used as input and the data features are continuously extracted using 1D convolution and pooling (Qazi, Almorjan & Zia, 2022). In the convolutional layer, the kernel performs a convolution operation on the feature-vector output of the previous layer. The output feature vector is then constructed using a nonlinear activation function. The output of each layer is the result of the convolution of multiple-input features. The mathematical model can be expressed as

(3) $\begin{matrix} X_{j}^{l} = F (\sum_{i \in M_{j}} X_{i}^{l - 1} \times k_{i j}^{l} + b_{j}^{l}) \end{matrix}$ where $M_{j}$ represents the set of input features or receptive field for the j-th output neuron in a convolutional layer; l denotes the l^th network layer; k denotes the convolution kernel; b denotes the network bias; $X_{i}^{l}$ and $X_{i}^{l - 1}$ denote the l^th-layer output and input, respectively; and $F (\cdot)$ denotes the activation function.

Single- and multi-branch structures

Single- and multi-branch strategies are two alternative model-building strategies. For a model with a single-branch structure, multiple factors are stitched together and fed into the model. When a multi-branch structure is employed, factor information is extracted using a separate substructure; the extracted information is then combined and fed into the subsequent network for learning.

In this study, we developed a new MB-1dCNN model based on the Geohash algorithm, word embedding, and CNN deep learning methods. The model was designed to identify the activity patterns of commercial fishing vessels. The model structure is illustrated in Fig. 1D. Additionally, we constructed the SB-FCNN (Fig. 1A), MB-FCNN (Fig. 1C), and SB-1dCNN (Fig. 1B) models.

The MB-FCNN model can be expressed as

(4) $\begin{matrix} X_{t} = C ({\hat{y}}_{1}, \dots, {\hat{y}}_{n}) \end{matrix}$ where ${\hat{y}}_{1}$ and ${\hat{y}}_{n}$ denote the outputs of the first and n^th layers, respectively; $C (\cdot)$ denotes multiple-value splicing; and $X_{t}$ denotes the layer after splicing.

The proposed MB-1dCNN model extracts the features of each input factor separately through convolution and pooling and then combines the extracted information in a fully connected layer before output. Compared to a CNN, into which all factors are input, the MB-1dCNN can extract the features of each factor more efficiently. The mathematical model can be expressed as

(5) $X_{t} = C (\underset{X_{j}^{l}}{\underset{⏟}{F A (M P (X_{i}^{l}))}}, \dots)$ where $M P (\cdot)$ denotes the pooling operation, $F A (\cdot)$ denotes the flattening function, $C (\cdot)$ denotes the splicing of multiple values, and $X_{t}$ denotes the layer after splicing.

Overall, the SB-FCNN model comprises an input layer, five dense layers, two dropout layers, and an output layer; the MB-FCNN model comprises an input layer, four dense layers for separate extraction of each input feature, two dropout layers, one dense layer for extraction of the combined feature information, and an output layer; the SB-1dCNN model comprises an input layer, four Conv1D layers, MaxPooling1D layer modules, two dropout layers, a dense layer, and an output layer; and the MB-1dCNN model comprises an input layer, four Conv1D layers and MaxPooling1D layer modules for separate extraction of each input feature, two dropout layers, a dense layer for extraction of the combined feature information, and an output layer. The parameters of the four models are listed in Table 1.

All of these models have four inputs: the location data through embedding processing after Geohash geocoding transformation, the embedding time data, speed and heading. Since the representation method of words in natural language processing is referenced, the vector obtained after embedding processing after Geohash transformation can be considered to contain the spatiotemporal contextual information of the trajectory. Except that the activation function used in the last layer of the models is SoftMax, the activation function used in other layers is rectified liner unit (ReLu) activation function. The Adam optimizer was implemented with a learning rate of 0.001; CategoricalCrossentropy was used as the loss function. The GPU was a GeForce RTX 2080 Ti run on a Windows operating system. The MB-1dCNN model was built using Python 3.7 and TensorFlow 2.4.1.

Datasets

Experimental data were obtained from the 2020 Digital China Innovation Competition (https://tianchi.aliyun.com/competition/entrance/231768/introduction), which provided trajectory data for three fishing-vessel operational types: trawl net, purse seine, and gillnet fishing. In these datasets, a trajectory segment comprises several track points of the vessel at sea based on a unique vessel ID, along with the fishing type corresponding to that track segment.

In this study, these data were preprocessed to remove operational track segments having fewer than five track points and heading directions outside the range of 0–360°. The final data volume was 8,163. The data fields comprised the fishing vessel ID, longitude, latitude, speed, heading, timestamp, and registered fishing method or operational type. The longitude, latitude, speed, and direction ranges were [0°, 127.803°], [0°, 44.824°], [0, 11] knots, and [0°, 360°], respectively. The distribution of each feature value domain in the dataset is shown in Fig. 3. The length distribution of the single operational trajectories and the data volumes for the different fishing behaviors are shown in Fig. 4. The track-point sequence length of a single-vessel trajectory segment ranged from 21 to 5,841, with an average length of ~598. From Fig. 4, it is evident that the majority of trajectory sequences have a length of less than 800. Consequently, the trajectory length for the model’s input has been determined to be 800. The dataset was divided into training, validation, and test datasets at a ratio of 7:2:1. The details of the dataset division are presented in Table 3.

Figure 3: Distribution of field values in the datasets.

Download full-size image

DOI: 10.7717/peerj-cs.3020/fig-3

Figure 4: Length distribution of trajectory segments.

Download full-size image

DOI: 10.7717/peerj-cs.3020/fig-4

Table 3:

The amount of data in the datasets for different fishing operation types.

	Purse seine	Trawl net	Gillnet	Total data
Train	2,479	2,302	933	5,714
Validation	708	658	266	1,632
Test	354	329	134	817
Total data	3,541	3,289	1,333	8,163

DOI: 10.7717/peerj-cs.3020/table-3

Note:

Accuracy, Precision, Recall, and F1 values in the table are the weighted averages of the three operation types. The weight is the proportion of each operation type to the total samples.

The spatiotemporal contextual features of the fishing vessel trajectories were extracted from the original dataset and used as inputs for the recognition model. The model input features included the tensors of the spatiotemporal contextual features of the fishing vessel trajectories, times, speeds, and headings. The spatiotemporal context tensor of the fishing-vessel trajectories was obtained by encoding the longitude and latitude data using the Geohash method and performing extraction by embedding using natural language processing. The time feature was obtained by converting the “Month‒Day” data to a character string; this information was then represented by a vector obtained by embedding of nature language processing. The maximum-minimum normalisation method (Bhanja & Das, 2018) is used to preprocess the data on speed and direction of heading, so as with the non-geohash transformation geolocation data for the comparison.

Evaluation metrics

Learner or model generalizability is a performance metric that requires valid and feasible experimental estimation methods and evaluation criteria. The performance metrics reflect the task requirements, and different metrics can often yield different results when the capabilities of different models are compared.

Following the construction of the complete classifier, we used a series of evaluation metrics to determine its effectiveness. Accuracy is one of the most commonly used performance measures in traditional classifier evaluation metrics, calculated as the ratio of correctly classified samples to the total number of samples. Typically, this metric can be applied to standard balanced data having a positive-to-negative sample ratio of 1:1. However, the number of samples is usually inconsistent between categories. In such cases, Accuracy should not be the only metric used to evaluate the classifier. For example, for a sample with a positive-to-negative sample ratio of 999:1, the model Accuracy can reach 99.9% even if the classifier predicts only the negative sample as positive. However, only the negative-sample information must be considered.

Consequently, another evaluation metric F1 is introduced, for which a more significant value indicates better model classification. F1 takes into account both Recall and Precision. Recall denotes the proportion of all positive samples predicted to be correct in the classification process, as well as Precision denotes the proportion of samples predicted to be in the positive class by the model during classification, calculated as the percentage of the correct predictions made by the model.

To reduce the resulting instability caused by the randomization of the neural network parameters, for both Accuracy and F1, we used the mean ± standard deviation values of the maximum and minimum values obtained after 12 training sessions under the same conditions for the same model.

Results and discussion

Influence of spatiotemporal contextual information on the classification accuracy of fishing vessel operation types

To investigate the impact of spatiotemporal contextual information on the classification accuracy of fishing vessel operation types, the comparison experiment using two different sets of trajectories data was designed. One set utilized latitude and longitude data directly for model training and prediction. The other set incorporated the spatiotemporal context information of the trajectories for the same purpose. The spatial contextual feature extraction of the trajectory were extracted using the GeoHash and Embedding techniques. Subsequently, the MB-1dCNN model was employed for training and prediction, achieving accuracy and F-score of 86.93% and 83.84% respectively on the test set.

As shown in Table 4, the comparison of these two experimental sets reveals that the classification accuracy and F-score improved by 14.13% and 22.36% respectively when using the Embedding method based on GeoHash transformation of vessel trajectories, compared to the model without spatiotemporal contextual information. This demonstrates that incorporating spatial contextual information from trajectories into the fishing vessel operation type classification model (MB-1dCNN) is highly effective in enhancing classification accuracy. Consequently, the spatiotemporal contextual information of trajectories promises to play a significant role in identifying fishing vessel types.

Table 4:

Spatiotemporal contextual information effect on the fishing operation type recognition.

Features	Accuracy	F-score
Lon&Lat	0.7280 ± 0.0045	0.6148 ± 0.0110
GH—EM (Lon&Lat)	0.8693 ± 0.0040	0.8384 ± 0.0053

DOI: 10.7717/peerj-cs.3020/table-4

Impact of different Geohash lengths on results

To address the issue of identifying fishing vessel operation types, we examined the effect of varying Geohash lengths on the MB-1dCNN model’s ability to identify operation types using the spatiotemporal contextual features of fishing vessel trajectories by comparing the accuracy. Figure 5 illustrates a plot of the change in recognition accuracy with the Geohash length. As the Geohash length increased, both the weighted Accuracy and weighted F1 scores of the model exhibited an upward trend followed by a decline. The highest classification accuracy was achieved at a Geohash length of 5; the weighted Accuracy was 0.8595 ± 0.0082 and weighted F1 being 0.8573 ± 0.0085. The corresponding spatial extent was approximately 4.89 × 4.89 km².

Figure 5: Model results of MB-1dCNN based on different lengths of geohash encoding.

Download full-size image

DOI: 10.7717/peerj-cs.3020/fig-5

As previously noted, each Geohash code represents a grid region on Earth, with the code length denoting the grid-region size. Figure 6 presents results for the gillnet, trawl net, and purse seine fishing operations in columns from left to right, respectively. The first row of Fig. 6 shows the original fishing vessel trajectories for each of these operations. The second to fourth rows (from top to bottom) present overlays of the Geohash geographic grids on the vessel trajectories when the Geohash lengths were 2, 5, and 6, respectively. When the Geohash length was too short (as seen in the second row of Fig. 6, with length = 2), the geospatial region was larger, with most of the trajectory points enrolled in one region. This led to information loss under the Geohash coding, thereby reducing the model classification accuracy. As the Geohash length gradually increased, the region size became more reasonable, and the specific fishing behavioral characteristics of the different fishing vessel types became evident after sequence encoding. Consequently, model classification accuracy improved (see the third row of Fig. 6, with length = 5). However, when the Geohash length was excessively long, the region division granularity became too fine, introducing noise and reducing model classification accuracy (see the fourth row of Fig. 6, with length = 6).

Figure 6: Trajectories of different fishing types overlaid with the grids represented by the different geohash encoding lengths.
From left to right: gillnet, trawl net, and purse seine. From top to bottom: original trajectory; overlay of original trajectory points with geohash grid when the encoding length is 2, 5, and 6 respectively.

Download full-size image

DOI: 10.7717/peerj-cs.3020/fig-6

Impact of different network structures on the model

We examine the recognition accuracies of the different models for the three fishing vessel operational types under investigation in this study. As noted previously, we utilized the trajectory-point latitude and longitude to construct the spatiotemporal contextual feature. It was then used as input feature for the models, alongside embedded-time, speed, and heading. As depicted in Table 5, the SB-1dCNN model achieved a weighted average Accuracy of 0.8660 ± 0.0037 and F1 score of 0.8636 ± 0.0038. The MB-1dCNN model showed slightly higher values, with a weighted average Accuracy of 0.8671 ± 0.0058 and F1 score of 0.8651 ± 0.0058. Both models demonstrated superior performance to the SB-FCNN and MB-FCNN models.

Table 5:

Evaluation metrics of different neural networks for the recognition effect of fishing operations type.

Model	Accuracy	Precision	Recall	F1
SB-FCNN	0.7841 ± 0.0334	0.7178 ± 0.0907	0.7841 ± 0.0334	0.7362 ± 0.0601
MB-FCNN	0.8365 ± 0.0341	0.8187 ± 0.0714	0.8365 ± 0.0341	0.8197 ± 0.0580
SB-CNN	0.8660 ± 0.0037	0.8648 ± 0.0041	0.8660 ± 0.0037	0.8636 ± 0.0038
MB-CNN	0.8671 ± 0.0058	0.8669 ± 0.0059	0.8671 ± 0.0058	0.8651 ± 0.0058

DOI: 10.7717/peerj-cs.3020/table-5

The SB-1dCNN and MB-1dCNN models outperformed their SB-FCNN and MB-FCNN counterparts. This indicates that CNN models are better suited for fishing-vessel operational types recognition than FCNN models. The success of CNN models may be due to their ability to mitigate the impact of noise between neighboring trajectory points, which can arise from becalmed scenarios or missed reports in certain trajectory segments. By using a convolution kernel of a specific size to extract features from trajectory segments, CNN models effectively reduce the influence of such noise, unlike FCNN models, which are more susceptible to noise-induced classification accuracy reductions. Thus, the results confirm that 1D-convolution is superior to fully connected networks for feature extraction in sequential data.

Effect of single- and multi-branch structures on model performance

As previously illustrated in Fig. 1, two distinct model-building strategies have been developed for 1D-sequence data: single- and multi-branch strategies. In the single-branch approach, multiple factors are concatenated and fed into the model sequentially as depicted in Figs. 1A and 1B. In contrast, the multi-branch method involves inputting multiple factors into separate substructures, and then combining the extracted information before feeding it into the subsequent network for learning as depicted in Figs. 1C and 1D.

The experimental results presented in Table 3 indicate that, for both the FCNN and CNN structures, the multi-branch-based models exhibit higher classification accuracy than the single-branch models. Specifically, the MB-1dCNN model demonstrated improvements of 0.0011 in weighted Accuracy and 0.0015 in weighted F1 score compared to the SB-1dCNN model. Similarly, the MB-FCNN model showed enhancements of 0.0524 in weighted Accuracy and 0.0835 in weighted F1 score in comparison to the SB-FCNN model. This difference may be due to the number of neural networks used to handle feature extraction. The multi-branch structure allows for separate feature extraction and fusion of inputs, avoiding the noise impact that occurs when inputs are directly concatenated in the single-branch structure. Consequently, when dealing with multiple types of different inputs, a multi-branch structure can more effectively extract the features of the inputs and enhance the model classification accuracy. Overall, this outcome further demonstrates the advantages of the proposed MB-1dCNN model in identifying fishing vessel operational types.

Conclusions

Accurate identification of the activity patterns or operational types of commercial fishing vessels is essential for the conservation and sustainable development of fishery resources. In this study, the MB-1dCNN model was developed to identify fishing vessel operation types using vessel trajectory data. The MB-1dCNN model is an end-to-end one-stage deep learning model that optimizes the overall objective of a task during the training process, and reduces the reliance on feature engineering common in traditional methods. Consequently, it reduces human intervention and is more suitable for the analysis of feature importance than existing alternatives. The Geohash geocoding algorithm was used to convert the geographic location of a track point into a literal representation and embedding techniques were used to extract spatio-temporal context information from trajectory sequences.

The MB-1dCNN model was compared with the SB-1dCNN, SB-FCNN, and MB-FCNN models in terms of recognition accuracy, and the proposed model exhibited better recognition performance for the three types of fishing operations considered in this study: trawl net, purse seine, and gillnet fishing. Overall, the 1D-CNN models exhibited better performance than the FCNN models when classifying the fishing vessels by operational type. Moreover, because multiple model input features were considered, both the 1D-CNN and FCNN models exhibited better performance under a multi-branch structure than a single-branch structure. Therefore, the combination of multi-branch structure and 1D-CNN has achieved superior results.

The influence of the Geohash length on the model accuracy was also explored. The weighted Accuracy and weighted F1 scores of the model tended to increase before decreasing with increasing Geohash length. The highest classification accuracy was obtained when the Geohash length was 5, with a weighted Accuracy of 0.8595 ± 0.0082 and a weighted F1 score of 0.8573 ± 0.0085. The spatial range represented by this Geohash encoding was approximately 4.89 × 4.89 km².

Overall, the MB-1dCNN model proposed in this article effectively realized fishing-vessel operational-type recognition by using the Geohash and embedding techniques and employing a convolutional neural network. Consequently, it reduces human intervention and is more suitable for the analysis of feature importance than existing alternatives. In the future, the authors will consider the importance or contribution of different features for the recognition effect using this method, and will compare this method with other methods such as LSTM to screen out a better model. Because vessels of different fishing operational types have different trajectory patterns in temporal and spatial distribution, better ways to obtain or identify these patterns will be explored in the future, possibly considering transformer models or graph neural networks.

Supplemental Information

All the codes used in this research.

DOI: 10.7717/peerj-cs.3020/supp-1

Download

[1] Ashrafi A, Tessem B, Enberg K. 2023. Detection of fishing activities from vessel trajectories. In: Nurcan S, Opdahl AL, Mouratidis H, Tsohou A, eds. Research Challenges in Information Science: Information Science and the Connected World. RCIS 2023. Lecture Notes in Business Information Processing. Cham: Springer. 476

[2] Babić K, Martinčić-Ipšić S, Meštrović A. 2020. Survey of neural text representation models. Information 11(11):511

[3] Bhanja S, Das A. 2018. Impact of data normalization on deep neural network for time series forecasting. ArXiv

[4] FAO. 2020. The state of world fisheries and aquaculture 2020. Sustainability in Action. Rome

[5] Feng Y, Zhao X, Han M, Sun T, Li C. 2019. The study of identification of fishing vessel behavior based on VMS data.

[6] Gao B, Wang L, Zhai Z, Wang Y, Zhu Y, Yao T, Liu D, Zhang L. 2020. Identification algorithm of fishing vessel operation type based on feature fusion.

[7] Garani G, Adam GK. 2020. A semantic trajectory data warehouse for improving nursing productivity. Health Information Science and Systems 8(1):25

[8] Guo N, Xiong W, Wu Y, Chen L, Jing N. 2019. A geographic meshing and coding method based on adaptive Hilbert-Geohash. IEEE Access 7:39815-39825

[9] Huang H, Hong F, Liu J, Liu C, Feng Y, Guo Z. 2019. FVID: fishing vessel type identification based on VMS trajectories. Journal of Ocean University of China 18(2):403-412

[10] Iacarella JC, Burke L, Clyde G, Wicks A, Clavelle T, Dunham A, Rubidge E, Woods P. 2023. Application of AIS-and flyover-based methods to monitor illegal and legal fishing in Canada’s Pacific marine conservation areas. Conservation Science and Practice 5(6):e12926

[11] Irshaid H, Hasan MM, Hasan R, Oh SJ. 2021. User activity and trip recognition using spatial positioning system data by integrating the Geohash and GIS approaches. Transportation Research Record 2675(4):391-405

[12] Janiesch C, Zschech P, Heinrich K. 2021. Machine learning and deep learning. Electronic Markets 31(3):685-695

[13] Jin J, Xiao Z, Qiu Q, Fang J. 2019. A Geohash based Place2vec model.

[14] Kassawmar T, Zeleke G, Bantider A, Gessesse DG, Abraha L. 2018. A synoptic land change assessment of Ethiopia’s rainfed agricultural area for evidence-based agricultural ecosystem management. Heliyon 4(11):e00914

[15] Kim K, Lee KM. 2020. Convolutional neural network-based gear type identification from automatic identification system trajectory data. Applied Sciences 10(11):4010

[16] LeCun Y, Bengio Y, Hinton G. 2015. Deep learning. Nature 521(7553):436-444

[17] Li G, Kuang S, Jiang G, He H. 2020. Electromagnetic big data track mining based on GeoHash. Electronic Information Warfare Technology 35(4):23-27

[18] Li Y, Zhang CZ, Zhai S, Li L. 2021a. Identification method of fishing vessel operation type based on LightGBM and BiLSTM.

[19] Li X, Zhou L, Xiao Y, Wu W, Su F, Shi W. 2021b. Spatial characteristics mining of fishing intensity in the Northern South China Sea based on fishing vessels AIS data. Journal of Geo-Information Science 23(5):850-859

[20] Liu S, Chen Z, Pan M, Zhang Q, Liu Z, Wang S, Chen D, Hu J, Pan X, Hu J, Li P, Wan C. 2019. Magnetic anomaly detection based on full connected neural network. IEEE Access 7 182198–182206

[21] Liu K, Kang G, Zhang N, Hou B. 2018. Breast cancer classification based on fully-connected layer first convolutional neural networks. IEEE Access 6:23722-23732

[22] Lu C, Wang L, Shen J, Lin J. 2021. Image enhancement using deep-learning fully connected neural network mean filter. The Journal of Supercomputing 77(3):3144-3164

[23] Mai G, Janowicz K, Yan B, Zhu R, Cai L, Lao N. 2020. Multi-scale representation learning for spatial feature distributions using grid cells. ArXiv

[24] Mesquita C, Dobby H, Jones SC, Pierce JG. 2024. Estimating fishing effort and LPUE for the Scottish brown crab (Cancer pagurus) trap fishery using VMS and observer data. Fisheries Research 274:106974

[25] Mikolov T, Chen K, Corrado G, Dean J. 2013. Efficient estimation of word representations in vector space. ArXiv

[26] Natale F, Gibin M, Alessandrini A, Vespe M, Paulrud A. 2015. Mapping fishing effort through AIS data. PLOS ONE 10(6):e0130746

[27] Orofino S, McDonald G, Mayorga J, Costello C, Bradley D. 2023. Opportunities and challenges for improving fisheries management through greater transparency in vessel tracking. ICES Journal of Marine Science 80(4):675-689

[28] Qazi EUH, Almorjan A, Zia T. 2022. A one-dimensional convolutional neural network (1D-CNN) based deep learning system for network intrusion detection. Applied Sciences 12(16):7986

[29] Rockström J, Barron J, Fox P. 2002. Rainwater management for increased productivity among small-holder farmers in drought prone environments. Physics and Chemistry of the Earth, Parts A/B/C 27(11–22):949-959

[30] Rousseau Y, Watson RA, Blanchard JL, Fulton AE. 2019. Evolution of global marine fishing fleets and the response of fished resources. Proceedings of the National Academy of Sciences of the United States of America 116(25):12238-12243

[31] Sonbol R, Rebdawi G, Ghneim N. 2022. The use of NLP-based text representation techniques to support requirement engineering tasks: a systematic mapping review. IEEE Access 10(4):62811-62830

[32] Suwardi IS, Dharma D, Satya DP, Lestari DP. 2015. Geohash index based spatial data model for corporate.

[33] Tang X, Zhang S, Fan W, Pei K. 2020. Fishing type identification of gill net and trawl net based on deep learning. Marine Fisheries 42(2):233-244

[34] Tian C, Zhang Y, Weng Z, Gu X, Chan WK. 2022. Learning large-scale location embedding from human mobility trajectories with graph neural networks.

[35] Vermard Y, Rivot E, Mahévas S, Marchal P, Gascuel D. 2010. Identifying fishing trip behaviour and estimating fishing effort from VMS data using Bayesian Hidden Markov models. Ecological Modelling 221(15):1757-1769

[36] Walker E, Bez N. 2010. A pioneer validation of a state-space model of vessel trajectories (VMS) with observer’ data. Ecological Modelling 221(17):2008-2017

[37] Wang Y, Hou Y, Che W, Liu T. 2020. From static to dynamic word representations: a survey. International Journal of Machine Learning and Cybernetics 11(7):1611-1630

[38] Wang C, Nulty P, Lillis D. 2021. A comparative study on word embeddings in deep learning for text classification.

[39] Xu P, Xie M, Zhou W, Suo A. 2021. Research on fishery resource assessment and sustainable utilization (FRASU) during 1990–2020: a bibliometric review. Global Ecology and Conservation 29(2):e01720

[40] Yamashita R, Nishio M, Do RKG, Togashi K. 2018. Convolutional neural networks: an overview and application in radiology. Insights Into Imaging 9(4):611-629

[41] Yang S, Shi H, Zhang S, Zhang H, Fei Y, Wang F, Jin W, Wang G, Wei F. 2022. Pelagic fishing vessel classification using bidirectional long short-term memory networks. Marine Sciences 46(3):25-35

[42] Yin F, Chen H. 2022. k anonymous location privacy preservation scheme based on Geohash coding. Wuhan University Journal of Natural Sciences 68(1):73-82

[43] Yin Y, Liu Z, Zhang Y, Wang S, Shah RR, Zimmermann R. 2019. Gps2vec: towards generating worldwide GPS embeddings.

[44] Zhang H, Yang S, Fan W, Shi H, Yuan S. 2021. Spatial analysis of the fishing behaviour of tuna purse seiners in the western and central Pacific based on vessel trajectory data. Journal of Marine Science and Engineering 9(3):322

[45] Zhao Y, Zhang J, Li Y, Fu X, Sheng W. 2020. Sentiment analysis using embedding from language model and multi-scale convolutional neural network. Journal of Computer Applications 40(3):651-657

[46] Zheng QL, Fan W, Zhang SM, Zhang H, Wang X, Guo G. 2016. Identification of fishing type from VMS data based on artificial neural network. South China Fisheries Science 12(2):81-87

[47] Zhou W, Chen X, Fan W, He Z, Yu L, Dai Y, Wang L. 2015. Location-based information service’s applications and its prospects in marine fisheries. World Sci-Tech R&D 37(05):611-617