Hear and See: End-to-end sound classification and visualization of classified sounds

Center for Data Science, RTI International, Durham, North Carolina, United States of America
DOI
10.7287/peerj.preprints.27280v1
Subject Areas
Artificial Intelligence, Computer Vision, Data Mining and Machine Learning, Data Science, Multimedia
Keywords
convolutional neural networks, sound classification, transfer learning, neural activation visualization, end-to-end processing, machine learning, multimedia, spectrograms, human-computer interaction, python
Copyright
© 2018 Miano
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Miano T. 2018. Hear and See: End-to-end sound classification and visualization of classified sounds. PeerJ Preprints 6:e27280v1

Abstract

Machine learning is a field of study that uses computational and statistical techniques to enable computers to learn. When machine learning is applied, it functions as an instrument that can solve problems or expand knowledge about the surrounding world. Increasingly, machine learning is also an instrument for artistic expression in digital and non-digital media. While painted art has existed for thousands of years, the oldest digital art is less than a century old. Digital media as an art form is a relatively nascent, and the practice of machine learning in digital art is even more recent. Across all artistic media, a piece is powerful when it can captivate its consumer. Such captivation can be elicited through through a wide variety of methods including but not limited to distinct technique, emotionally evocative communication, and aesthetically pleasing combinations of textures. This work aims to explore how machine learning can be used simultaneously as a scientific instrument for understanding the world and as an artistic instrument for inspiring awe. Specifically, our goal is to build an end-to-end system that uses modern machine learning techniques to accurately recognize sounds in the natural environment and to communicate via visualization those sounds that it has recognized. We validate existing research by finding that convolutional neural networks, when paired with transfer learning using out-of-domain data, can be successful in mapping an image classification task to a sound classification task. Our work offers a novel application where the model used for performant sound classification is also used for visualization in an end-to-end, sound-to-image system.

Author Comment

This is a submission to PeerJ Computer Science for review.