Introduction#

Welcome to the online supplement to the tutorial on “Introduction to Differentiable Audio Synthesiser Programming”, presented at the 24th International Society for Music Information Retrieval Conference at Politecnico di Milano, November 5th — 9th 2023. This web book contains all content presented at the tutorial, including code examples and references. It also contains further material, going into greater depth on the topics covered.

Motivation & Aims#

The field of differentiable digital signal processing (DDSP) emerged with the incorporation of components, such as linear synthesis filters [JBYA19] and spectral modelling synthesisers [EHGR20], into the end-to-end training of neural networks for speech and musical instrument synthesis. It has since grown to encompass applications including pitch estimation [ESH+20], source separation [SFRK+23], physical parameter estimation [SudholtCXR23], synthesiser sound matching [MS23], and beyond. By introducing a strong inductive bias, DDSP methods can often lead to a reduction in model complexity and data requirements, and provide an intermediate representation that is inherently interpretable in terms of familiar parametric signal processors.

Yet despite the growing popularity of such methods in research, the implementation of differentiable audio synthesizers is not always clearly documented, and the simple formulation of many synthesizers can obscure what often turns out to be complex optimization behaviour. This tutorial aims to address this gap through an introduction to the fundamentals of differentiable synthesizer programming. In particular, we hope that researchers in adjacent fields may find applications for these techniques in their work.

Who is this for?#

This book and the accompanying tutorial are aimed at music and audio researchers and engineers who wish to gain a practical understanding of how differentiable digital signal processors can be implemented and used in audio synthesis. The content is targeted at those with a grounding in the fundamentals of signal processing and machine learning, although references to educational resources are provided where relevant for those who wish to refresh or supplement their existing knowledge. We assume prior knowledge of Python 3. The tutorial content is written using the PyTorch machine learning framework. We do not assume prior experience with PyTorch, and attempt to provide sufficient background to allow the tutorial content to be understood. Nonetheless, we encourage those who are totally new to PyTorch to explore the official tutorials to provide further context.

A note on scope: we’re not training neural networks#

The majority of applications of DDSP involve its use in combination with a neural network, which typically produces the parameters of the signal processing operation (e.g. the cutoff frequency of a filter, or harmonic amplitudes). In this tutorial and web book, however, we focus only on the components of this system after the parameters are given — that is, we concern ourselves only with the differentiable signal processing. Nonetheless, all of the techniques we present can be (and mostly have been) composed with a neural network, as a direct consequence of their implementation using differentiable operations in a machine learning framework.

Getting Started#

This web book consists of a series of Jupyter notebooks, which can be explored statically on this page. To run the notebooks yourself, you will need to clone the Git repository:

git clone https://github.com/intro2ddsp/intro2ddsp.github.io.git && cd intro2ddsp.github.io

Then, you should create a Python virtual environment, using virtualenv, conda, or similar:

python -m venv venv && source venv/bin/activate

or:

conda create --name intro2ddsp python==3.10 && conda activate intro2ddsp

Next, you should install the dependencies:

pip install -r requirements.txt

And finally, you can launch the Jupyter notebook server:

jupyter notebook

About the Authors#

Ben Hayes is a final year PhD student at the Centre for Digital Music’s CDT in Artificial Intelligence and Music, Queen Mary University of London. His research focuses on expanding the capabilities of differentiable digital signal processing by resolving optimisation pathologies caused by symmetry. His work has been accepted to leading conferences in the field, including ISMIR, ICLR, ICASSP, ICA, and the AES Convention, and published in the Journal of the Audio Engineering Society. He has worked as a Research intern at Sony Computer Science Laboratories in Paris and ByteDance’s Speech Audio and Music Intelligence team in London. He was also Music Lead at the award-winning generative music startup Jukedeck, and an internationally touring musician signed to R&S Records.

Jordie Shier is a first year PhD student in the Artificial Intelligence and Music (AIM) programme based at Queen Mary University of London (QMUL), studying under the supervision of Prof. Andrew McPherson and Dr. Charalampos Saitis. His research is focused on the development of novel methods for synthesizing audio and the creation of new interaction paradigms for music synthesizers. His current PhD project is on real-time timbral mapping for synthesized percussive performance and is being conducted in collaboration with Ableton. He was a co-organizer of the 2021 Holistic Evaluation of Audio Representations (HEAR) NeurIPS challenge and his work has been published in PMLR, DAFx, and the JAES. Previously, he completed an MSc in Computer Science and Music under the supervision of Prof. George Tzanetakis and Assoc. Prof. Kirk McNally.

Chin-Yun Yu is a first year PhD student in the Artificial Intelligence and Music (AIM) programme based at Queen Mary University of London (QMUL), under the supervision of Dr György Fazekas. His current research theme is on leveraging signal processing and deep generative models for controllable, expressive vocal synthesis. In addition, he is dedicated to open science and reproducible research by developing open-source packages and contributing to public research projects. He received a BSc in Computer Science from National Chiao Tung University in 2018 and was a research assistant at the Institute of Information Science, Academia Sinica, supervised by Prof. Li Su. His recent work has been published at ICASSP.

David Südholt is a first year PhD student in the Artificial Intelligence and Music (AIM) programme based at Queen Mary University of London (QMUL). Supervised by Prof. Joshua Reiss, he is researching parameter estimation for physical modelling synthesis, focussing on the synthesis and expressive transformation of the human voice. He received an MSc degree in Sound and Music Computing from Aalborg University Copenhagen in 2022, where he was supervised by Prof. Stefania Serafin and Assoc. Prof. Cumhur Erkut. His work has been published at the SMC conference and in the IEEE/ACM Transactions on Audio, Speech and Language Processing.

Rodrigo Diaz is a PhD candidate in Artificial Intelligence and Music at Queen Mary University in London, under the supervision of Prof. Mark Sandler and Dr. Charalampos Saitis. Rodrigo’s work has been published in leading computer vision and audio conferences, including CVPR, ICASSP, IC3D, and the AES Conference on Headphone Technology. Before starting his PhD studies, he worked as a researcher at the Immersive Communications group at the Fraunhofer HHI Institute in Berlin, where he investigated volumetric reconstruction from images using neural networks. His current research focuses on real-time audio synthesis using neural networks for 3D objects and drums. Rodrigo’s interdisciplinary background includes a Master’s degree in Media Arts and Design from Bauhaus University in Weimar and a Bachelor of Music from Texas Christian University.