Classification of bird sounds – Author Interview

Today we published an article which combines feature-learning – an automatic analysis technique – and a classification algorithm, to create a system that can detect which bird species are present in a large dataset. This automatic large-scale classification could be useful for expert and amateur bird-watchers alike.

We were very interested in hearing more from Dan Stowell about this successful way of identifying bird sounds from large audio collections.

PJ: Can you tell us a bit about yourself?

DS: I’m a research fellow at QMUL in London, and I’m working on applying machine learning techniques to analyse bird sounds. I develop techniques to answer questions such as “What species of bird?”, “How many birds?”, “Are they calling to each other, or ignoring each other?” just by automatically analyzing the audio content.

It’s a fascinating topic because bird vocalizations have so much rich structure – you can tell just by listening – and we are a long way from understanding all of that structure. So I’m developing tools that can help us analyze these sounds. On the one hand I make use of what we know about bird sounds, and on the other hand these tools will enable us to find out more about bird sounds by analyzing large amounts of sound recordings.


PJ: Can you briefly explain the research you published in PeerJ?

DS: The research is about automatically classifying bird species from a sound recording. Simple concept: you have a sound recording, but you’re no bird expert, so you want the machine to tell you which species are present. Or maybe you’re a bird expert but you have thousands of sound recordings because you run a sound archive or an ecological monitoring project. Either way, it’s valuable to have some automated way to work out which species are present in each recording. So we apply “machine learning” to learn from labeled examples and generalize to unlabeled examples.

People have published research on species classification since at least 1997, but often it’s been on small datasets – for example a personal collection covering ten or twenty species. In real outdoor recordings there are hundreds of possible bird species. And the more species you have to choose between, the harder the task becomes.

The specific contribution of this paper is to apply a technique called “unsupervised feature learning” which can dramatically improve classification performance by automatically finding a high-dimensional transformation of the audio data. We put this together with a modern classification algorithm (“random forest”) to create a bird sound classifier that performs very well even on a very big dataset, thousands of recordings covering more than 500 species in Brazil.

PJ: What surprised you the most with these results?

DS: The really tricky thing that we found is that you get very different results on small datasets and on big datasets. So, imagine for example that you’re developing a new method, you test it on twenty recordings, just as a quick test so you can decide whether or not to apply it to a million recordings. And I’m not talking about statistical significance here: let’s assume that twenty is enough to find a significant difference between two techniques. The real killer is that the results from twenty might point the other way from the results you’d get from a million. They might seem to show your method wouldn’t work, when in fact it would! The reason for this is that some techniques, in particular this unsupervised feature learning, really get their strength from large datasets. You can see it really clearly in our results, where the new technique seems a bit pointless on the smallest dataset, but then the benefit becomes really clear, as the datasets get bigger.

Another surprise was that we applied our technique, which in a sense learns little “jigsaw pieces” that go to make up a bird sound, and then later on we discovered that neuroscience research had found similar jigsaw pieces represented in the sensitivities of bird auditory neurons. I’m not claiming that our system does the same thing as a bird’s hearing system – it’s not designed to – but it’s a hint that we’re doing something right.

PJ: Where do you hope to go from here?

DS: The next topic I’m working on is how to get more information out than just a species label. I’m working on techniques that can transcribe all the bird sounds in an audio scene: not just who is talking, but when, in response to whom, and what relationships are reflected in the sound (e.g. dominance, pair-bonding). To pull all this information out of unlabeled sound, we need to apply more maths!

PJ: Why did you choose to reproduce the complete peer-review history of your article?

DS: I think it’s a great idea to publish peer reviews. Often they’re the most focused expert feedback you’ll get on your work, and it’s good to make the most of that expertise by letting future readers see how they reacted to the paper. I still think anonymity is important in peer review (to reduce the risk of biased judgments) so I think PeerJ’s approach here is a good one.

PJ: How did you first hear about PeerJ, and what persuaded you to submit to us?

DS: I heard of PeerJ from my biologist colleagues, and I noticed quite a few of them publishing there, so I asked around. The speed of the publishing process was a definite plus for me, as well as it being properly open-access, and the very readable way articles are formatted online.

PJ: How would you describe your experience of our submission/review process?

DS: Smooth and efficient, and yes it was fast compared against all my previous experience with journals.

PJ: Did you get any comments from your colleagues about your publication with PeerJ?

DS: I’m based in an Electronic Engineering / Computer Science department and most of my colleagues hadn’t heard of PeerJ. It seems that the biological sciences are doing pretty well for modern open-access journals, and it’d be nice for EE/CS to catch up.

PJ: Anything else you would like to talk about?

DS: I should mention one thing that made this “big data” bird sound research possible. There are various audio archives in the world, but for machine learning research we need public datasets, and preferably open-licensed datasets.

One of the sources we used was Xeno Canto, which crowd sources many thousands of recordings under open licenses. Also a recent French research project called SABIOD, which has performed a massive service for the community by creating research challenges – public challenges to get the best score on tasks such as bird species sound classification. Challenges are a great way to set benchmarks for specific tasks, really understand where the state of the art is, and stretch it a little bit further. They (SABIOD) coordinated a challenge using the Brazil data that I mentioned, and I’m pleased to say our system was the best-performing audio-only classifier that was entered in the challenge.

PJ: Many thanks for your time!

DS: And thank you!

Join Dan Stowell and thousands of other satisfied authors, and submit your next article to PeerJ. And until the end of August, if you engage with PeerJ articles or preprints, then you can publish for free!

You may also like...