Sample-level sound synthesis with recurrent neural networks and conceptors

Experimental Music Technologies Lab, Department of Music, University of Sussex, Brighton, United Kingdom
DOI
10.7287/peerj.preprints.27361v1
Subject Areas
Artificial Intelligence, Multimedia
Keywords
sound synthesis, machine learning, reservoir computing, conceptors, dynamical systems
Copyright
© 2018 Kiefer
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Kiefer C. 2018. Sample-level sound synthesis with recurrent neural networks and conceptors. PeerJ Preprints 6:e27361v1

Abstract

Conceptors are a recent development in the field of reservoir computing; they can be used to influence the dynamics of recurrent neural networks (RNNs), enabling generation of arbitrary patterns based on training data. Conceptors allow interpolation and extrapolation between patterns, and also provide a system of boolean logic for combining patterns together. Generation and manipulation of arbitrary patterns using conceptors has significant potential as a sound synthesis method for applications in computer music and procedural audio but has yet to be explored.

Two novel methods of sound synthesis based on conceptors are introduced. Conceptular Synthesis is based on granular synthesis; sets of conceptors are trained to recall varying patterns from a single RNN, then a runtime mechanism switches between them, generating short patterns which are recombined into a longer sound. Conceptillators are trainable, pitch-controlled oscillators for harmonically rich waveforms, commonly used in a variety of sound synthesis applications. Both systems can exploit conceptor pattern morphing, boolean logic and manipulation of RNN dynamics, enabling new creative sonic possibilities. Experiments reveal how RNN runtime parameters can be used for pitch-independent timestretching and for precise frequency control of cyclic waveforms. They show how these techniques can create highly malleable sound synthesis models, trainable using short sound samples. Limitations are revealed with regards to reproduction quality, and pragmatic limitations are also shown, where exponential rises in computation and memory requirements preclude the use of these models for training with longer sound samples.

The techniques presented here represent an initial exploration of the sound synthesis potential of conceptors; future possibilities and research questions are outlined, including possibilities in generative sound.

Author Comment

This is a submission to PeerJ Computer Science for review.

Supplemental Information

The snare drum sample used to train models in experiment 1

DOI: 10.7287/peerj.preprints.27361v1/supp-1

The resynthesised snare drum, ouput by the trained model in experiment 1

DOI: 10.7287/peerj.preprints.27361v1/supp-2

An 11 point morph between snare and bongo samples using a model trained in experiment 1

DOI: 10.7287/peerj.preprints.27361v1/supp-3

An linear 11 point mix between snare and bongo samples

DOI: 10.7287/peerj.preprints.27361v1/supp-4

The kick drum sample used to train models in experiment 2

DOI: 10.7287/peerj.preprints.27361v1/supp-5

A low quality resynthesis of a kick drum sample in experiment 2, used the fixed window size method from experiment 1

DOI: 10.7287/peerj.preprints.27361v1/supp-6

Resynthesis of the kick drum in experiment 2, using sample segmentation on zero-crossing points

DOI: 10.7287/peerj.preprints.27361v1/supp-7

An example of timestretching from 50% up to 800% in 50% steps, using the kick drum model in experiment 2

DOI: 10.7287/peerj.preprints.27361v1/supp-8

Sample of the spoken word 'two' used to train models in experiment 3

DOI: 10.7287/peerj.preprints.27361v1/supp-9

Resynthesis of the word 'two', made using the model trained in experiment 3 phase 1

DOI: 10.7287/peerj.preprints.27361v1/supp-10

Resynthesis of the phoneme 'oo', using the model trained in experiment 3, phase 1

DOI: 10.7287/peerj.preprints.27361v1/supp-11

Resynthesis of an analogue square wave, using the trained model from experiment 4

DOI: 10.7287/peerj.preprints.27361v1/supp-12

The oscillator model in experiment 4 under pitch control, using leak rate scaling

The leak rate scale rises linearly from 0 to 2.

DOI: 10.7287/peerj.preprints.27361v1/supp-13

A bassline melody produced by the pitch controlled oscillator model in experiment 4

DOI: 10.7287/peerj.preprints.27361v1/supp-14

An arpeggiated sequence produced from the trained model in experiment 5

DOI: 10.7287/peerj.preprints.27361v1/supp-15

Reconstruction of individual signals using the trained model in experiment 1

Green lines show the original signal and blue lines show the reconstruction.

DOI: 10.7287/peerj.preprints.27361v1/supp-16

An 11-point morph from bongo to snare, created using interpolated conceptors in experiment 1

y-axes represent amplitude and x-axes represent time

DOI: 10.7287/peerj.preprints.27361v1/supp-17

An 11-point morph from bongo to snare, created using linear time-domain mixing in experiment 1

y-axes represent amplitude and x-axes represent time

DOI: 10.7287/peerj.preprints.27361v1/supp-18

Reconstruction of individual signals in experiment 2

Orange lines show the original signal and blue lines show the reconstruction.

DOI: 10.7287/peerj.preprints.27361v1/supp-19