Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

What does the Canary Say? Low-Dimensional GAN Applied to Birdsong

Silvia Pagliarini 1 Nathan Trouvain 1 Arthur Leblois 2 Xavier Hinaut 1
1 Mnemosyne - Mnemonic Synergy
LaBRI - Laboratoire Bordelais de Recherche en Informatique, Inria Bordeaux - Sud-Ouest, IMN - Institut des Maladies Neurodégénératives [Bordeaux]
Abstract : The generation of speech, and more generally com- plex animal vocalizations, by artificial systems is a difficult problem. Generative Adversarial Networks (GANs) have shown very good abilities for generating images, and more recently sounds. While current GANs have high-dimensional latent spaces, complex vocalizations could in principle be generated through a low-dimensional latent space, easing the visualization and evaluation of latent representations. In this study, we aim to test the ability of a previously developed GAN, called WaveGAN, to reproduce canary syllables while drastically reducing the latent space dimension. We trained WaveGAN on a large dataset of canary syllables (16000 renditions of 16 different syllable types) and varied the latent space dimensions from 1 to 6. The sounds produced by the generator are evaluated using a RNN- based classifier. This quantitative evaluation is paired with a qualitative evaluation of the GAN productions across training epochs and latent dimensions. Altogether, our results show that a 3-dimensional latent space is enough to produce all syllable types in the repertoire with a quality often indistinguishable from real canary vocalizations. Importantly, we show that the 3-dimensional GAN generalizes by interpolating between the various syllable types. We rely on UMAP [1] to qualitatively show the similarities between training and generated data, and between the generated syllables and the interpolations produced. We discuss how our study may provide tools to train simple models of vocal production and/or learning. Indeed, while the RNN- based classifier provides a biologically realistic representation of the auditory network processing vocalizations, the small dimensional GAN may be used for the production of complex vocal repertoires.
Complete list of metadata
Contributor : Xavier Hinaut Connect in order to contact the contributor
Submitted on : Friday, November 26, 2021 - 9:12:06 PM
Last modification on : Tuesday, January 4, 2022 - 6:17:14 AM


Files produced by the author(s)


  • HAL Id : hal-03244723, version 2



Silvia Pagliarini, Nathan Trouvain, Arthur Leblois, Xavier Hinaut. What does the Canary Say? Low-Dimensional GAN Applied to Birdsong. 2021. ⟨hal-03244723v2⟩



Les métriques sont temporairement indisponibles