Spatial position constraint for unsupervised learning of speech representations

View article
PeerJ Computer Science

Main article text

 

Introduction

  • Can the proposed geometry-based auto-encoder, trained on unlabeled speech, learn to extract speech features which are useful for the keyword spotting task?

  • How is the performance of the proposed method as compared to the traditional speech cepstral features?

  • Is the proposed method applicable for a low-resource language, such as the Kadazan language?

Literature Review

Unsupervised representation learning

Representation learning for speech

Geometric distance in speech representation learning

Method

Spatial position constraint

Auto-encoder architecture

Experiments and Results

Dataset and experimental setup

Results and analysis

Kadazan Language analysis

Conclusions

Supplemental Information

Spatial-AE Python code

DOI: 10.7717/peerj-cs.650/supp-1

Additional Information and Declarations

Competing Interests

The authors declare there are no competing interests.

Author Contributions

Mohammad Ali Humayun, Hayati Yassin and Pg Emeroylariffion Abas conceived and designed the experiments, performed the experiments, analyzed the data, performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

Data Availability

The following information was supplied regarding data availability:

The code file for our tests are available in the Supplementary File.

The dataset used for analysis is available from TensorFlow: https://www.tensorflow.org/datasets/catalog/speech_commands.

Funding

The authors received no funding for this work.

4 Citations 1,121 Views 115 Downloads