A novel fast pedestrian recognition algorithm based on point cloud compression and boundary extraction

View article
PeerJ Computer Science

Main article text

 

Introduction

State-of-the-art studies

Compression method based on boundary extraction

Point cloud compression method based on boundary extraction

Calculation of normal vector and curvature

where a,b,c a,b,c denote the coefficients of the plane equation and also represent the normal vectors of the query point pi pi, x,y,z x,y,z represent the three-dimensional coordinate values of the midpoint of the k neighbourhood; d d denotes the distance from the origin to the plane. The covariance matrix, A A, is computed from the k-neighbourhood element of the query point p as follows:

where k k denotes the number of adjacent points of point pi pi, ¯p ¯p represents the three-dimensional centroid of adjacent element coordinates. The covariance matrix A has three real eigenvalues: λ1 λ1, λ2 λ2, λ3,(λ1λ2λ3λ3), (λ1λ2λ3), the corresponding three eigenvectors are denoted by n1,n2,n3 n1,n2,n3, n1 n1 denotes the normal vector (nx,ny,nz) (nx,ny,nz). Then the curvature value of the query point pi pi is calculated by k k-neighbourhood surface fitting (Yu et al., 2010). The quadric surface function has universal applicability and is convenient for subsequent curvature calculation. Therefore, this article uses quadric surface fitting to calculate the curvature value. The initial fitting function of the quadric surface is generally defined by

where a, b, c, e, and f denote the coefficients of the quadratic surface fitting. Equation (3) shows a single-valued function, and a set of values corresponds to a unique z value. However, many non-single-value mapping problems exist in the collected data (Xin & Jingying, 2020). Therefore, it is necessary to establish a local coordinate system with the query point p as an origin (u,v,ω) (u,v,ω). The direction of the ω ω axis and the average vector at pi pi are the same. The u u and the v v axes are orthogonal in the tangent plane. Together with the ω ω axis, they form a rectangular coordinate system. After running both translations and rotations, the local coordinate system was established. Equation (3) is then changed to the parameter expression in local coordinates as follows:

The extraction of boundary points

  • (1) With a pi pi as the reference point and the point pj pj in the neighbourhood of k k, the vector vij(j = 1,2,…,k) is established and projected onto the local coordinates established in the previous step to form the projection vector vj.

  • (2) Move in the counterclockwise direction, calculate the included angle βj(j+1)(whenj=k,j+1=1) βj(j+1)(whenj=k,j+1=1) of two adjacent projection vectors respectively, then sort the included angles βj(j+1) βj(j+1) in ascending order, and calculate the maximum difference Lmax of the two adjacent included angles.

  • (3) Determine whether it is a boundary point by comparing the value of the given angle thresholds Lstd and Lmax If Lstd>Lmax, pi pi denotes the boundary point, and vice versa, represents the internal point.

The extraction of sharp points

where k k denotes the number of point clouds in the PII neighbourhood; Hij Hij shows the curvature value of pj pj in the neighbourhood; IiI Hi represents the curvature value of the pi pi, and ¯H ¯H denotes the mean of the curvature of the pj in the neighbourhood.

where αi αi denotes the angle between the normal vector ni of the point pipi and the normal vector nj of the point pj pj in the neighbourhood.

where λ λ denotes the control coefficient of local curvature weight δi δi; τ represents the control coefficient of the average of the included angle for the normal vector ¯αi ¯αi, ¯di ¯di denotes the average distance between pi pi and pj pj in its neighbourhood.

where Hmax denotes the maximum curvature value, and t represents the distance values of all sharp points. If f > F, P is called a sharp point, otherwise a flat point.

The grid compression of point cloud

Point cloud pedestrian recognition algorithm based on image mapping

The network structure of the YOLOv5

  • (1) Input: YOLOv5 uses the Mosaic data enhancement method in YOLOv4 to enrich data sets, reduce hardware requirements and reduce the use of GPU. The adaptive anchor frame calculation function is embedded in the whole training code. The switch can be adjusted automatically, and the adaptive image scaling can improve the speed of target detection in the reasoning stage.

  • (2) Backbone: The focus module is the unique structure of the YOLOv5. The key is to slice the feature map into blocks and then transfer the image features to the next block through multi-layer convolution pooling, cross-stage partial network (CSPNet) (Chien-Yao et al., 2020), and feature pyramid pooling (SPP) (He et al., 2015) structures.

  • (3) Neck: The feature pyramid networks (FPN) (Tsung-Yi et al., 2017) and path aggregation network (PAN) (Shu et al., 2018) structures of the YOLOv4 are used.

  • However, the CSP2 structure designed concerning CSPnet is adopted, which strengthens information dissemination and can accurately retain spatial information as the neck structure of the YOLOv5 differs.

  • (4) Prediction: The improved loss function called GIOU-Loss can accurately identify some objects with overlapping occlusion and resolve the defects of the conventional IOU.

The loss function of the YOLOv5 network

  • (1) Bounding box loss:

where the IoU IoU denotes the ratio of the overlapping area of the prediction box and the target box to the area of the parallel part; ρ denotes the distance between the centre point of the prediction box and the target box; ν is used to indicate the similarity of aspect ratio; α represents the influence coefficient of ν.

  • (2) Confidence loss and classification loss:

where y denotes the real label and p(x) represents the model output.

Mapping of point cloud data to images

where P(i)rect P(i)rect denotes the internal parameter matrix representing the ith camera, R(0)rect R(0)rect represents the correction matrix from each camera to camera 0, Tcamvelo Tcamvelo denotes the rotation and translation matrices from the camera to the laser radar. Mapping image pixels to PCD requires each matrix’s row and column expansion. X represents the homogeneous coordinate form of PCD, and Tcamvelo Tcamvelo denotes the external parameter matrix of laser radar and the camera obtained through calibration, including the rotation matrix and translation matrix.

K-means clustering of point cloud pedestrian

where ui denotes the mean vector of Ci and is denoted by Eq. (18). The distance from the point to all centre points belonging to the clusters of the nearest center point is calculated for each point. After running iterative calculations, the centre point of the cluster is recalculated, and the centre point nearest to each point is found until there is no change in the previous two iteration results.

Experiment and the analysis of the results

Basic idea of the compression method

  1. Read the point cloud, and establish the topological structure for the point cloud.

  2. Calculate the normal vector and curvature of the point cloud.

  3. Extract the feature points and sharp points of the point cloud.

  4. Use the nearest neighbour point of the centre point in the grid to replace other points in the flat area and simplify the grid processing. The PCD in the flat area is obtained when calculating sharp points.

  5. Fuse the PCD and delete the same points.

  6. Save the compressed PCD.

  1. Use YOLO to recognize pedestrians’ images.

  2. Construct a mapping between PCD and image,

  3. Employ K-means clustering to detect targets.

The processing of the experimental compression

Comparative analysis of the evaluation of the experimental surface area

Evaluation index of point cloud compression experiment

Comparison of the accuracy of image recognition algorithm after point cloud compression

Conclusion

Supplemental Information

Additional Information and Declarations

Competing Interests

The authors declare that they have no competing interests.

Author Contributions

Yanjun Zhang conceived and designed the experiments, performed the experiments, analyzed the data, performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Data Availability

The following information was supplied regarding data availability:

The code is available in the Supplemental File and the data is available at 3DCOMET: http://www.rovit.ua.es/dataset/3dcomet/downloads.html.

Funding

This study was supported by the Guangdong Provincial Department of Education New Generation Information Technology Key Special Field Fund project: Research on Key Technologies of Encoding and Decoding of Real-time Vehicular Lidar Point Cloud Sequences (No. 2021ZDZX1123). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

936 Visitors 860 Views 45 Downloads

Your institution may have Open Access funds available for qualifying authors. See if you qualify

Publish for free

Comment on Articles or Preprints and we'll waive your author fee
Learn more

Five new journals in Chemistry

Free to publish • Peer-reviewed • From PeerJ
Find out more