PeerJ Computer Science Preprints: Multimediahttps://peerj.com/preprints/index.atom?journal=cs&subject=10500Multimedia articles published in PeerJ Computer Science PreprintsMachine learning of symbolic compositional rules with genetic programming: Dissonance treatment in Palestrinahttps://peerj.com/preprints/277312019-05-152019-05-15Torsten AndersBenjamin Inden
We describe a method to automatically extract symbolic compositional rules from music corpora that can be combined with each other and manually programmed rules for algorithmic composition, and some preliminary results of applying that method. As machine learning technique we chose genetic programming, because it is capable of learning formula consisting of both logic and numeric relations. Genetic programming was never used for this purpose to our knowledge. We therefore investigate a well understood case in this pilot study: the dissonance treatment in Palestrina’s music. We label dissonances with a custom algorithm, automatically cluster melodic fragments with labelled dissonances into different dissonance categories (passing tone, suspension etc.) with the DBSCAN algorithm, and then learn rules describing the dissonance treatment of each category with genetic programming. As positive examples we use dissonances from a given category. As negative examples we us all other dissonances; melodic fragments without dissonances; purely random melodic fragments; and slight random transformations of positive examples. Learnt rules circumstantiate melodic features of the dissonance categories very well, though some resulting best rules allow for minor deviations compared with positive examples (e.g., allowing the dissonance category suspension to occur also on shorter notes).
We describe a method to automatically extract symbolic compositional rules from music corpora that can be combined with each other and manually programmed rules for algorithmic composition, and some preliminary results of applying that method. As machine learning technique we chose genetic programming, because it is capable of learning formula consisting of both logic and numeric relations. Genetic programming was never used for this purpose to our knowledge. We therefore investigate a well understood case in this pilot study: the dissonance treatment in Palestrina’s music. We label dissonances with a custom algorithm, automatically cluster melodic fragments with labelled dissonances into different dissonance categories (passing tone, suspension etc.) with the DBSCAN algorithm, and then learn rules describing the dissonance treatment of each category with genetic programming. As positive examples we use dissonances from a given category. As negative examples we us all other dissonances; melodic fragments without dissonances; purely random melodic fragments; and slight random transformations of positive examples. Learnt rules circumstantiate melodic features of the dissonance categories very well, though some resulting best rules allow for minor deviations compared with positive examples (e.g., allowing the dissonance category suspension to occur also on shorter notes).Sample-level sound synthesis with recurrent neural networks and conceptorshttps://peerj.com/preprints/273612018-11-192018-11-19Chris Kiefer
Conceptors are a recent development in the field of reservoir computing; they can be used to influence the dynamics of recurrent neural networks (RNNs), enabling generation of arbitrary patterns based on training data. Conceptors allow interpolation and extrapolation between patterns, and also provide a system of boolean logic for combining patterns together. Generation and manipulation of arbitrary patterns using conceptors has significant potential as a sound synthesis method for applications in computer music and procedural audio but has yet to be explored.
Two novel methods of sound synthesis based on conceptors are introduced. Conceptular Synthesis is based on granular synthesis; sets of conceptors are trained to recall varying patterns from a single RNN, then a runtime mechanism switches between them, generating short patterns which are recombined into a longer sound. Conceptillators are trainable, pitch-controlled oscillators for harmonically rich waveforms, commonly used in a variety of sound synthesis applications. Both systems can exploit conceptor pattern morphing, boolean logic and manipulation of RNN dynamics, enabling new creative sonic possibilities. Experiments reveal how RNN runtime parameters can be used for pitch-independent timestretching and for precise frequency control of cyclic waveforms. They show how these techniques can create highly malleable sound synthesis models, trainable using short sound samples. Limitations are revealed with regards to reproduction quality, and pragmatic limitations are also shown, where exponential rises in computation and memory requirements preclude the use of these models for training with longer sound samples.
The techniques presented here represent an initial exploration of the sound synthesis potential of conceptors; future possibilities and research questions are outlined, including possibilities in generative sound.
Conceptors are a recent development in the field of reservoir computing; they can be used to influence the dynamics of recurrent neural networks (RNNs), enabling generation of arbitrary patterns based on training data. Conceptors allow interpolation and extrapolation between patterns, and also provide a system of boolean logic for combining patterns together. Generation and manipulation of arbitrary patterns using conceptors has significant potential as a sound synthesis method for applications in computer music and procedural audio but has yet to be explored.Two novel methods of sound synthesis based on conceptors are introduced. Conceptular Synthesis is based on granular synthesis; sets of conceptors are trained to recall varying patterns from a single RNN, then a runtime mechanism switches between them, generating short patterns which are recombined into a longer sound. Conceptillators are trainable, pitch-controlled oscillators for harmonically rich waveforms, commonly used in a variety of sound synthesis applications. Both systems can exploit conceptor pattern morphing, boolean logic and manipulation of RNN dynamics, enabling new creative sonic possibilities. Experiments reveal how RNN runtime parameters can be used for pitch-independent timestretching and for precise frequency control of cyclic waveforms. They show how these techniques can create highly malleable sound synthesis models, trainable using short sound samples. Limitations are revealed with regards to reproduction quality, and pragmatic limitations are also shown, where exponential rises in computation and memory requirements preclude the use of these models for training with longer sound samples.The techniques presented here represent an initial exploration of the sound synthesis potential of conceptors; future possibilities and research questions are outlined, including possibilities in generative sound.Hear and See: End-to-end sound classification and visualization of classified soundshttps://peerj.com/preprints/272802018-10-152018-10-15Thomas Miano
Machine learning is a field of study that uses computational and statistical techniques to enable computers to learn. When machine learning is applied, it functions as an instrument that can solve problems or expand knowledge about the surrounding world. Increasingly, machine learning is also an instrument for artistic expression in digital and non-digital media. While painted art has existed for thousands of years, the oldest digital art is less than a century old. Digital media as an art form is a relatively nascent, and the practice of machine learning in digital art is even more recent. Across all artistic media, a piece is powerful when it can captivate its consumer. Such captivation can be elicited through through a wide variety of methods including but not limited to distinct technique, emotionally evocative communication, and aesthetically pleasing combinations of textures. This work aims to explore how machine learning can be used simultaneously as a scientific instrument for understanding the world and as an artistic instrument for inspiring awe. Specifically, our goal is to build an end-to-end system that uses modern machine learning techniques to accurately recognize sounds in the natural environment and to communicate via visualization those sounds that it has recognized. We validate existing research by finding that convolutional neural networks, when paired with transfer learning using out-of-domain data, can be successful in mapping an image classification task to a sound classification task. Our work offers a novel application where the model used for performant sound classification is also used for visualization in an end-to-end, sound-to-image system.
Machine learning is a field of study that uses computational and statistical techniques to enable computers to learn. When machine learning is applied, it functions as an instrument that can solve problems or expand knowledge about the surrounding world. Increasingly, machine learning is also an instrument for artistic expression in digital and non-digital media. While painted art has existed for thousands of years, the oldest digital art is less than a century old. Digital media as an art form is a relatively nascent, and the practice of machine learning in digital art is even more recent. Across all artistic media, a piece is powerful when it can captivate its consumer. Such captivation can be elicited through through a wide variety of methods including but not limited to distinct technique, emotionally evocative communication, and aesthetically pleasing combinations of textures. This work aims to explore how machine learning can be used simultaneously as a scientific instrument for understanding the world and as an artistic instrument for inspiring awe. Specifically, our goal is to build an end-to-end system that uses modern machine learning techniques to accurately recognize sounds in the natural environment and to communicate via visualization those sounds that it has recognized. We validate existing research by finding that convolutional neural networks, when paired with transfer learning using out-of-domain data, can be successful in mapping an image classification task to a sound classification task. Our work offers a novel application where the model used for performant sound classification is also used for visualization in an end-to-end, sound-to-image system.Creating an open geospatial kinetic wall for science mediation: Preliminary results from the GeoWall projecthttps://peerj.com/preprints/272362018-09-252018-09-25Cannata MassimilianoMilan AntonovicSerena CangianoMarco Lurati
GeoWall is an innovative interactive physical wall based on advanced Open technologies that enables the dissemination of scientific contents to a wide public. The system, in line with Open Science principles, aims to promote the link between science, local impacts, education and global context so that the intrinsic value of university is clearly communicated and understood to citizens. GeoWall is though to be an interactive panel that combines touch technology, web services, visualization on maps, multimedia contents and digital graphics in a lightweight kinetic structure realized with digital manufacturing techniques. In the front a touch screen and multiple moving screens will display the information while in the back a panel worked with the numeric control milling machine will represents the scientific data element. The system is designed to use open technologies as much as possible including Open Software and Open Hardware. The touch screen enables user interaction by selecting global challenges and local impacts represented on a dynamic map. When interaction is activated additional information is communicated trough smaller moving screens representing different multimedia sources and formats. Although the system is not yet completed, the first tests are very promising and already give clear indications on the appropriateness of the selected concept and realized hardware and software.
GeoWall is an innovative interactive physical wall based on advanced Open technologies that enables the dissemination of scientific contents to a wide public. The system, in line with Open Science principles, aims to promote the link between science, local impacts, education and global context so that the intrinsic value of university is clearly communicated and understood to citizens. GeoWall is though to be an interactive panel that combines touch technology, web services, visualization on maps, multimedia contents and digital graphics in a lightweight kinetic structure realized with digital manufacturing techniques. In the front a touch screen and multiple moving screens will display the information while in the back a panel worked with the numeric control milling machine will represents the scientific data element. The system is designed to use open technologies as much as possible including Open Software and Open Hardware. The touch screen enables user interaction by selecting global challenges and local impacts represented on a dynamic map. When interaction is activated additional information is communicated trough smaller moving screens representing different multimedia sources and formats. Although the system is not yet completed, the first tests are very promising and already give clear indications on the appropriateness of the selected concept and realized hardware and software.An interactive tool for teaching map projectionshttps://peerj.com/preprints/272182018-09-162018-09-16Magnus HeitzlerHans-Rudolf BärRoland SchenkelLorenz Hurni
Map projections are one of the fundamental concepts of geographic information science and cartography. An understanding of the different variants and properties is critical when creating maps or carrying out geospatial analyses. To support learning about map projections, we present an online tool that allows to interactively explore the construction process of map projections. A central 3D view shows the three main building blocks for perspective map projections: the globe, the projection surface (cone, cylinder, plane) and the projection center. Interactively adjusting these objects allows to create a multitude of arrangements forming the basis for common map projections. Further insights can be gained by adding supplementary information, such as projection lines and Tissot’s indicatrices. Once all objects have been arranged in a desired way, the projection surface can be unrolled to form the final flat map. Currently, the tool is limited to visualize the construction of true perspective map projections. In the future, prime concerns are to increase the genericity of the application to support more map projections and to integrate it into the GITTA (Geographic Information Technology Training Alliance) platform.
Map projections are one of the fundamental concepts of geographic information science and cartography. An understanding of the different variants and properties is critical when creating maps or carrying out geospatial analyses. To support learning about map projections, we present an online tool that allows to interactively explore the construction process of map projections. A central 3D view shows the three main building blocks for perspective map projections: the globe, the projection surface (cone, cylinder, plane) and the projection center. Interactively adjusting these objects allows to create a multitude of arrangements forming the basis for common map projections. Further insights can be gained by adding supplementary information, such as projection lines and Tissot’s indicatrices. Once all objects have been arranged in a desired way, the projection surface can be unrolled to form the final flat map. Currently, the tool is limited to visualize the construction of true perspective map projections. In the future, prime concerns are to increase the genericity of the application to support more map projections and to integrate it into the GITTA (Geographic Information Technology Training Alliance) platform.Multi-label classification of frog species via deep learninghttps://peerj.com/preprints/30072017-06-062017-06-06Jie Xie
Acoustic classification of frogs has received increasing attention for its promising application in ecological studies. Various studies have been proposed for classifying frog species, but most recordings are assumed to have only a single species. In this study, a method to classify multiple frog species in an audio clip is presented. To be specific, continuous frog recordings are first cropped into audio clips (10 seconds). Then, various time-frequency representations are generated for each 10-s recording. Next, instead of using traditional hand-crafted features, a deep learning algorithm is used to find the most important feature. Finally, a binary relevance based multi-label classification approach is proposed to classify simultaneously vocalizing frog species with our proposed features. Experimental results show that our proposed features extracted using deep learning can achieve better classification performance when compared to hand-crafted features for frog call classification.
Acoustic classification of frogs has received increasing attention for its promising application in ecological studies. Various studies have been proposed for classifying frog species, but most recordings are assumed to have only a single species. In this study, a method to classify multiple frog species in an audio clip is presented. To be specific, continuous frog recordings are first cropped into audio clips (10 seconds). Then, various time-frequency representations are generated for each 10-s recording. Next, instead of using traditional hand-crafted features, a deep learning algorithm is used to find the most important feature. Finally, a binary relevance based multi-label classification approach is proposed to classify simultaneously vocalizing frog species with our proposed features. Experimental results show that our proposed features extracted using deep learning can achieve better classification performance when compared to hand-crafted features for frog call classification.Luminophonics experiment: A user study on visual sensory substitution devicehttps://peerj.com/preprints/12892015-08-102015-08-10Shern Shiou TanTomas MaulNeil Mennie
Loss of vision is a severe impairment to the dominant sensory system. It often has a catastrophic effect upon the sufferer, with knock-on effects to their standard of living, their ability to support themselves, and their care-givers lives. Research into visual impairments is multi-faceted, focusing on the causes of these debilitating conditions as well as attempting to alleviate the daily lives of affected individuals. One of the methods is through the usage of sensory substitution device. Our proposed system, Luminophonics, focuses on visual to auditory cross modalities information conversions. A visual to audio sensory substitution device a type of system that obtains a continual stream visual inputs which it converts into corresponding auditory soundscape. Ultimately, this device allows the visually impaired to visualize the surrounding environment by only listening to the generated soundscape. Even though there is a huge potential for this kind of devices, public usage is still minimal (Loomis, 2010). In order to promote the adoption from the visually impaired, the overall performance of these devices need to be improved in terms of soundscape interpretability, information preservation and listening comfort amongst other factors. Luminophonics has developed 3 type of prototypes, which we have used to explore different ideas pertaining to visual to audio sensory substitution. In addition to these, one of the prototypes has been converted to include depth information using time of flight camera. Previously, an automated measurement method is used to evaluate the performance of the 3 prototypes (Tan, 2013). The results of the measurement cover the effectiveness in terms of interpretability and information preservation. The main purpose of the experiment reported herein, was to test the prototypes on human subjects in order to gain greater insight on how they perform in real-life situations.
Loss of vision is a severe impairment to the dominant sensory system. It often has a catastrophic effect upon the sufferer, with knock-on effects to their standard of living, their ability to support themselves, and their care-givers lives. Research into visual impairments is multi-faceted, focusing on the causes of these debilitating conditions as well as attempting to alleviate the daily lives of affected individuals. One of the methods is through the usage of sensory substitution device. Our proposed system, Luminophonics, focuses on visual to auditory cross modalities information conversions. A visual to audio sensory substitution device a type of system that obtains a continual stream visual inputs which it converts into corresponding auditory soundscape. Ultimately, this device allows the visually impaired to visualize the surrounding environment by only listening to the generated soundscape. Even though there is a huge potential for this kind of devices, public usage is still minimal (Loomis, 2010). In order to promote the adoption from the visually impaired, the overall performance of these devices need to be improved in terms of soundscape interpretability, information preservation and listening comfort amongst other factors. Luminophonics has developed 3 type of prototypes, which we have used to explore different ideas pertaining to visual to audio sensory substitution. In addition to these, one of the prototypes has been converted to include depth information using time of flight camera. Previously, an automated measurement method is used to evaluate the performance of the 3 prototypes (Tan, 2013). The results of the measurement cover the effectiveness in terms of interpretability and information preservation. The main purpose of the experiment reported herein, was to test the prototypes on human subjects in order to gain greater insight on how they perform in real-life situations.Auditory interfaces in automated driving: an international surveyhttps://peerj.com/preprints/10692015-07-162015-07-16Pavlo BazilinskyyJoost C. F. De Winter
This study investigated peoples’ opinion on auditory interfaces in contemporary cars and their willingness to be exposed to auditory feedback in automated driving. We used an Internet-based survey to collect 1,205 responses from 91 countries. The participants stated their attitudes towards two existing auditory driver assistance systems, a parking assistant (PA) and forward collision warning system (FCWS), as well as towards a futuristic augmented sound system (FS) proposed for fully automated driving. The respondents were positive towards the PA and FCWS, and rated their willingness to have these systems as 3.87 and 3.77, respectively (1 = disagree strongly, 5 = agree strongly). The respondents tolerated the FS. The results showed that a female voice is the most preferred feedback mode for the support of takeover requests in highly automated driving, regardless of whether the respondents’ country is English speaking or not. The present results could be useful for designers of automated vehicles and other stakeholders.
This study investigated peoples’ opinion on auditory interfaces in contemporary cars and their willingness to be exposed to auditory feedback in automated driving. We used an Internet-based survey to collect 1,205 responses from 91 countries. The participants stated their attitudes towards two existing auditory driver assistance systems, a parking assistant (PA) and forward collision warning system (FCWS), as well as towards a futuristic augmented sound system (FS) proposed for fully automated driving. The respondents were positive towards the PA and FCWS, and rated their willingness to have these systems as 3.87 and 3.77, respectively (1 = disagree strongly, 5 = agree strongly). The respondents tolerated the FS. The results showed that a female voice is the most preferred feedback mode for the support of takeover requests in highly automated driving, regardless of whether the respondents’ country is English speaking or not. The present results could be useful for designers of automated vehicles and other stakeholders.The Shake Stickhttps://peerj.com/preprints/9952015-04-212015-04-21Alex WilsonAbram Hindle
We present a new embedded instrument, with discussion on the challenges of developping embedded instruments, and the practice and theory of NIME evaluation and design. The Shake Stick is a Raspberry Pi-based embedded instrument using SuperCollider for granular synthesis. In our analysis and design, we explore the MINUET design framework, dimension space analysis for inter-instrument comparison, and learning curves. Furthermore, we discuss lessons learned from using the instrument in group improvisation, as well as challenges and prospects for the creation of sound palettes used in the granular synthesis.
We present a new embedded instrument, with discussion on the challenges of developping embedded instruments, and the practice and theory of NIME evaluation and design. The Shake Stick is a Raspberry Pi-based embedded instrument using SuperCollider for granular synthesis. In our analysis and design, we explore the MINUET design framework, dimension space analysis for inter-instrument comparison, and learning curves. Furthermore, we discuss lessons learned from using the instrument in group improvisation, as well as challenges and prospects for the creation of sound palettes used in the granular synthesis.