Random forest algorithms for recognizing daily life activities using plantar pressure information: a smart-shoe study

View article
Loading...
PeerJ

Main article text

 

Introduction

Materials and Methods

7-sensor plantar pressure measurement insole

Data collection

Data preprocessing

Feature extraction

  • General statistics analysis: The mean, maximum, standard deviation, and median were calculated for each time series. This category included 56 extracted features.

  • Peak analysis: The peak number, average and standard deviation (SD) of the interval between peaks, average and SD of the peak magnitudes, and average and SD of the peak widths were calculated for each time series using the SciPy library (Jones, Oliphant & Peterson, 2001). The peak widths were calculated at 30% of the peak height. The default parameters of the library were used for the computation of all other features extracted from the peak analysis. This category included 98 extracted features.

  • Gait phase analysis: The envelope of the signal of the seven sensors was calculated for each foot. For each identified full stance phase, the difference in the force peak yield between the foot contact on the ground (early stance phase) and the foot lift (late stance phase) was calculated and the values averaged over the window. The average duration of the double float phase was also calculated or was set to the null value when such phase does not exist. Two features were extracted in this category.

  • Frequency domain analysis: The signal of the 14 sensors was summed up and a fast Fourier transform (FFT) was conducted. Preliminary FFT analyses were conducted. The following 5 features were extracted from the AC component of the discrete frequency component series (0.05–50 Hz) and included in the final analysis: (1) power density, (2) frequency signal weighted average from 1.67 to 10 Hz, (3) skewness of the frequency components below 10 Hz, (4) mean of the AC components from 2 to 10 Hz, and (5) standard deviation of the same segment. Events with a frequency lesser than 2 Hz were assumed to be related to the gait cycle. Gait cycle-related behaviors were expected to be described by the features extracted from the above described peak analysis. Moreover, human movements are assumed to not exceed a frequency greater than 10 Hz. Therefore, only the spectral signals at frequencies less than 10 Hz were considered in the present analysis. Five features were extracted in this category.

  • Pressure distribution analysis: The envelope of the signal of sensors 4, 5, 6, and 7, located in the forefoot area (Fig. 1A), was computed. The difference between the mean of this new series of data and the plantar pressures detected by sensor 1 (heel, Fig. 1A) was calculated for the left and right feet. The difference was averaged to express the anterior–posterior distribution of the plantar pressures. The difference between the mean of the plantar pressures detected by sensor 6 (medial forefoot) and the mean of the plantar pressures detected by sensor 4 (lateral forefoot) was calculated for the left and right feet. The values were averaged to express the medial–lateral distribution of the plantar pressures. Moreover, a Pearson correlation test was used to test the (1) agreement between the envelope of sensor 4, 5, 6, and 7 signals and sensor 1 signal and (2) agreement between the signal of sensor 4 and that of sensor 6. These correlation coefficients were calculated for both the left and right feet. Six features were extracted in this category.

Design of activity recognition algorithms

Data analysis framework

  1. The window length analysis aims at identifying the optimum analytic window length.

  2. The analysis of a pre-selected set of 25 sensor configurations, that is, configurations using the information of different numbers of sensors and/or the information of sensors placed at different locations, aims at identifying the best hardware combination for each possible number of sensors ranging from 1 to 6 (the 7-sensor configuration only has 1 possible combination). This analysis was conducted using the optimum window length identified in (1).

  3. A final analysis exploring the contribution of each feature to the forest outputs aims at finding the most efficient number of features to be used for each of the seven best sensor configurations identified in (2). Again, this analysis was conducted using the optimum window length identified in (1).

Stage 1: window length

Stage 2: number and location of sensors

Stage 3: number of features

Results

Stage 1: window length

Stage 2: number and location of sensors

  • 6 sensors, 145 (heel, lateral midfoot, lateral forefoot, medial forefoot, center of the midfoot, center of the forefoot): 0.89 (min: 0.82, max: 0.92)

  • 6 sensors, 142, (heel, lateral midfoot, lateral forefoot, big toe, center of the midfoot, center of the forefoot): 0.89 (min: 0.83, max: 0.91)

  • 5 sensors, 120 features (heel, lateral midfoot, lateral forefoot, center of the midfoot, and center of the forefoot): 0.89 (min: 0.85, max: 0.92)

  • 4 sensors, 98 features (heel, lateral midfoot, lateral forefoot, center of the forefoot): 0.89 (min: 0.85, max: 0.92)

Stage 3: number of features

Supplementary results

Discussion

Performances

Comparison with previous studies and originality

Temporal resolution, sensor configuration, number of features, manufacturing, and algorithmic considerations

Limitations and strengths

Conclusions

Supplemental Information

One example of one decision tree in one selected forest.

The tree is composed of 101 nodes and leaves. This number may vary from trees to tress. During the training process, nodes are split until all data points correspond to one activity. At each node, the decision is based on the parameter that best discriminates the sample in two sub-samples. The process is repeated until the generation of pure offspring, i.e. leaves containing data points for one given activity only. Gini: sample impurity, score from 0 to 1, with 0 indicating pure offspring. Samples: number of data point evaluated by the nodes. Value ([cycling, downstairs, office, run, sitting, slope, standing, upstairs, walking]): weight of each activity in the evaluated sample; 0 indicates the absence of data point for one given activity. Class: activity with the most data point. The tree is extracted from the following forest: window length: 20 sec, configuration: 7 sensors, assignment: 1, run: 1.

DOI: 10.7717/peerj.10170/supp-1

Summary of results for a larger panel of 67 sensor configurations (window length: 20 seconds, numbers of features: maximum, 20, best).

DOI: 10.7717/peerj.10170/supp-2

Summary of results for a larger panel of 67 sensor configurations (window lengths: all, number of features: best).

Blue: best single forest. Orange: best average rate of good predictions.

DOI: 10.7717/peerj.10170/supp-3

Summary of the supplementary analyses following the formats of Figure 5 and 7.

(A) window length effect on activity recognition rate (best combination for “number and location of sensors” and “number of features”). Pink boxes: 1, 5, 10 and 15 seconds. Green box: 20 seconds (considered optimum). Yellow boxes: 25, 30, 35, 40, 45, 50, 55 and 60 seconds. Red diamonds: mean values. (B) Performance of activity recognition of random forest algorithms for 25 sensor configurations (best combination for “window length” and “number of features”). Green boxes: sensor configurations that were expected to perform well. Pink boxes: sensor configurations that were expected to perform poorly. Red diamonds: mean values. (C) Performance of activity recognition of random forest algorithms for 25 sensor configurations (“window length”: 20 seconds, “number of features”: best average rate of good prediction). Green and pink boxes: same chart as for panel B.

DOI: 10.7717/peerj.10170/supp-4

Additional Information and Declarations

Competing Interests

The authors declare that they have no competing interests.

Author Contributions

Dian Ren conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

Nathanael Aubert-Kato performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

Emi Anzai performed the experiments, prepared figures and/or tables, and approved the final draft.

Yuji Ohta performed the experiments, authored or reviewed drafts of the paper, and approved the final draft.

Julien Tripette conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

Human Ethics

The following information was supplied relating to ethical approvals (i.e., approving body and any reference numbers):

The experimental protocol was approved by the Ochanomizu University Research Ethics Committee (#2018-01).

Data Availability

The following information was supplied regarding data availability:

The code is available in GitHub: https://github.com/dian-R/CodeForSmartShoesPeerJ.

The data are available at Zenodo: Ren, Dian, Aubert-Kato, Nathanael, Anzai, Emi, Ohta, Yuji, & Tripette, Julien. (2020). Data for: “Random forest algorithms for recognizing daily life activities using plantar pressure information: A smart-shoe study” [Data set]. PeerJ. Zenodo. DOI 10.5281/zenodo.4050390.

Funding

The authors received no funding for this work.

13 Citations 2,484 Views 654 Downloads

Your institution may have Open Access funds available for qualifying authors. See if you qualify

Publish for free

Comment on Articles or Preprints and we'll waive your author fee
Learn more

Five new journals in Chemistry

Free to publish • Peer-reviewed • From PeerJ
Find out more