Language and music are peculiar human behaviors. We spend a large portion of our lives speaking, reading, processing speech, performing music and listening to tunes. At the same time, we still know very little as to why and how these structured behaviors emerged in our species. For the particular case of music, the mystery is even greater than language. Music is a widespread human behavior which does not seem to confer any evolutionary advantage. A possible approach to study the origins of music is to hypothesize and empirically test the mechanisms behind this structured behavior [1, 2]. For language, potential mechanisms were first tested in-silico [3, 4], showing how random pairs of signal-meanings become more structured and systematic when artificial agents play a ‘game of telephone’ with them. These results were replicated with human participants evolving a language-like system , confirming the importance of computer simulations in testing hypotheses on the cultural evolution of human behavior. Finally, recent work applied this approach to musical rhythm , showing that musical structures can indeed emerge via cultural transmission. A recent paper in Artificial Life adopted this methodological approach, testing the emergence of a sound system at the boundary between music and language . Similar to communication systems found in humans, other animals and in-silico experiments, a meaning space was paired with a signal space. The meaning space coincided with a set of pictures showing different facial emotional expressions. The signal space was a set of 5-note patterns. Crucially, the experimenters randomly paired meanings to signals, which were in turn randomly structured. These random pairings of emotional expressions and random note sequences were then used in signaling games, where pairs of participants used note sequences to communicate emotional states. The resulting signal-meaning pairs, with all their human-introduced variations, were then used in new signaling games with new participants. Over time, the small biases introduced in each artificial transmission step accumulated, displaying quantitative trends. In particular, the authors found the emergence, over the course of artificial human generations, of features resembling some properties of language and music.