Deepfake video detection: YOLO-Face convolution recurrent approach

View article
RT @PeerJCompSci: Highlighted in the PeerJ 'Deep Learning Algorithms and Techniques to Identify Deepfakes' collection - Deepfake video dete…
RT @PeerJCompSci: Highlighted in the PeerJ 'Deep Learning Algorithms and Techniques to Identify Deepfakes' collection - Deepfake video dete…
Highlighted in the PeerJ 'Deep Learning Algorithms and Techniques to Identify Deepfakes' collection - Deepfake video detection: YOLO-Face convolution recurrent approach Read the full article https://t.co/hUSdwAIIml #ArtificialIntelligence #DataMining #MachineLearning https://t.co/UGH9y5vpLW
PeerJ Computer Science

Main article text

 

Introduction

  • A refined YOLO-Face detector version is presented to detect face areas from video frames in order to improve the performance of detecting the videos authenticity.

  • A fine-tuned Convolution Recurrent Neural Network called EfficientNet-B5 Bi-LSTM is introduced to extract the spatial-temporal features from the short sequence of frames for detecting the videos authenticity. This is due to the fact that deepfake video was generated from processing facial synthesis frame-by-frame, and hence, pixels' values of video in synthesis regions are not coherent and consistent in spatial and temporal information.

  • A combined CelebDF-FaceForencics++ (c23) dataset is introduced. It provides an integrated and diverse deepfake dataset, and helps to improve the applicability of the deepfake detection model in the real world.

  • Comprehensive analysis of several deep-learning models applied in the context of deepfake detection is presented, in terms of AUROC, accuracy, recall, precision, and F-measure.

The proposed method

Dataset description

Experimental result analysis

Performance measures

Experimental results and analysis

where bik represents the number of instances misclassified by method i but identified correctly by method k, and cki represents the number of instances misclassified by method k but not by method i. If the estimated test value is greater than the chi-squared table value of 3.84 at a 95% confidence interval, then the difference of the two classification methods results is statistically significant. Again, Table 6 shows McNemar’s test comparison between the proposed method and the other state-of-the-art methods on the FF++ (c23) training dataset and the Celeb-DF testing dataset. As can be seen from Table 6, McNemar’s statistical test confirmed that differences in classification result success are statistically significant for every pairwise comparison between the proposed method and the other state-of-the-art methods.

Conclusion and future work

Supplemental Information

Extract video frames.

DOI: 10.7717/peerj-cs.730/supp-1

Extract yolo faces part 1.

DOI: 10.7717/peerj-cs.730/supp-2

Extract yolo faces part 2.

DOI: 10.7717/peerj-cs.730/supp-3

Prepare data using pasting approach.

DOI: 10.7717/peerj-cs.730/supp-4

Prepare data using bootstrap aggregating approach.

DOI: 10.7717/peerj-cs.730/supp-5

Train data using efficientNet-b5.

DOI: 10.7717/peerj-cs.730/supp-6

Extract video features for Bidirection-LSTM.

DOI: 10.7717/peerj-cs.730/supp-7

Train the temporal model Bidirectional-LSTM.

DOI: 10.7717/peerj-cs.730/supp-8

Test on 518 test set.

DOI: 10.7717/peerj-cs.730/supp-9

Code Steps.

1. extract-video-frames.py: Extract video frames

2. yoloface.py, utils.py: Extract Yolo faces part 1, part 2

3. prepareData-Pasting.py: Prepare data (CelebDF-FaceForencics++ (c23)) using pasting approach

4. prepareData-Bootstrap aggregating.py: Prepare data (CelebDF-FaceForencics++ (c23)) using bootstrap aggregating approach

5. Train.py: Train data using EfficientNet-b5 (pre-trained on noisy student weights) on each training set for spatial features

6. BidirectionLSTM-Features.py: Extract video features for Bidirectional-LSTM

7. Train-Conv-Bi-LSTM.py: Train the temporal model Bidirectional-LSTM to learn the sequences

8. evaluate-model.py: Test on 518 test set of CelebDF dataset

DOI: 10.7717/peerj-cs.730/supp-10

Additional Information and Declarations

Competing Interests

The authors declare that they have no competing interests.

Author Contributions

Aya Ismail conceived and designed the experiments, performed the experiments, analyzed the data, performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

Marwa Elpeltagy conceived and designed the experiments, performed the experiments, analyzed the data, performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

Mervat Zaki conceived and designed the experiments, authored or reviewed drafts of the paper, and approved the final draft.

Kamal A. ElDahshan conceived and designed the experiments, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

Data Availability

The following information was supplied regarding data availability:

The FaceForencies++ dataset is available from the FaceForensics site: https://github.com/ondyari/FaceForensics. Rossler A, Cozzolino D, Verdoliva L, Riess C, Thies J, and Nießner M. 2019. Faceforensics++: Learning to detect manipulated facial images. Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 1–11.

The Celeb-DF dataset is available from the celeb-deepfakeforensics site: https://github.com/yuezunli/celeb-deepfakeforensics. Li Y, Yang X, Sun P, Qi H, and Lyu S. 2020. Celeb-df: A large-scale challenging dataset for deepfake forensics. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 3207–3216.

The Python scripts are available in the Supplemental Files.

Funding

The authors received no funding for this work.

26 Citations 4,701 Views 1,631 Downloads