Bibliography

Bibliography#

ArikJD18: Sercan Ö Arık, Heewoo Jun, and Gregory Diamos. Fast spectrogram inversion using multi-head convolutional neural networks. IEEE Signal Processing Letters, 26(1):94–98, 2018.
CE21: Antoine Caillon and Philippe Esling. RAVE: A variational autoencoder for fast and high-quality neural audio synthesis. December 2021. arXiv [Preprint]. Available at https://doi.org/10.48550/arXiv.2111.05011 (Accessed 2022-03-08). URL: http://arxiv.org/abs/2111.05011 (visited on 2022-03-08), arXiv:2111.05011.
CLT+21: Michelle Carney, Chong Li, Edwin Toh, Ping Yu, and Jesse Engel. Tone Transfer: In-Browser Interactive Neural Audio Synthesis. In Joint Proceedings of the ACM IUI 2021 Workshops. 2021.
CMS22: Franco Caspe, Andrew McPherson, and Mark Sandler. DDX7: Differentiable FM Synthesis of Musical Instrument Sounds. In Proceedings of the 23rd International Society for Music Information Retrieval Conference. 2022.
CS23: Manuel Cherep and Nikhil Singh. Synthax: a fast modular synthesizer in jax. In Audio Engineering Society Convention 155. May 2023. URL: http://www.aes.org/e-lib/browse.cfm?elib=22261.
DHS+23: Rodrigo Diaz, Ben Hayes, Charalampos Saitis, György Fazekas, and Mark Sandler. Rigid-body sound synthesis with differentiable modal resonators. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023.
EAC+19: Jesse Engel, Kumar Krishna Agrawal, Shuo Chen, Ishaan Gulrajani, Chris Donahue, and Adam Roberts. GANSynth: adversarial neural audio synthesis. In International Conference on Learning Representations. 2019. URL: https://openreview.net/forum?id=H1xQVn09FX.
EHGR20: Jesse Engel, Lamtharn (Hanoi) Hantrakul, Chenjie Gu, and Adam Roberts. DDSP: Differentiable Digital Signal Processing. In 8th International Conference on Learning Representations. April 2020.
ERR+17: Jesse Engel, Cinjon Resnick, Adam Roberts, Sander Dieleman, Mohammad Norouzi, Douglas Eck, and Karen Simonyan. Neural audio synthesis of musical notes with wavenet autoencoders. In International Conference on Machine Learning, 1068–1077. PMLR, 2017.
ESH+20: Jesse Engel, Rigel Swavely, Lamtharn Hanoi Hantrakul, Adam Roberts, and Curtis Hawthorne. Self-supervised pitch detection by inverse audio synthesis. In ICML 2020 Workshop on Self-supervision in Audio and Speech. 2020. URL: https://openreview.net/forum?id=RlVTYWhsky7.
Fan95: Gunnar Fant. The lf-model revisited. transformations and frequency domain analysis. Speech Trans. Lab. Q. Rep., Royal Inst. of Tech. Stockholm, 2(3):40, 1995.
GEB15: Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. A Neural Algorithm of Artistic Style. Journal of Vision, 16(12):326, 2015. doi:10.1167/16.12.326.
HSF21: Ben Hayes, Charalampos Saitis, and György Fazekas. Neural Waveshaping Synthesis. In Proceedings of the 22nd International Society for Music Information Retrieval Conference. Online, November 2021.
HSF23: Ben Hayes, Charalampos Saitis, and György Fazekas. Sinusoidal frequency estimation by gradient descent. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5. IEEE, 2023.
HSF+23: Ben Hayes, Jordie Shier, György Fazekas, Andrew McPherson, and Charalampos Saitis. A Review of Differentiable Digital Signal Processing for Music & Speech Synthesis. August 2023. arXiv:2308.15422, doi:10.48550/arXiv.2308.15422.
HVL+23: Wen-Chin Huang, Lester Phillip Violeta, Songxiang Liu, Jiatong Shi, and Tomoki Toda. The Singing Voice Conversion Challenge 2023. July 2023. arXiv [Prerint]. Available at https://doi.org/10.48550/arXiv.2306.14422 (Accessed 2023-07-25). arXiv:2306.14422.
JVLP19: Marc Jordà, Pedro Valero-Lara, and Antonio J. Peña. Performance evaluation of cudnn convolution algorithms on nvidia volta gpus. IEEE Access, 7():70461–70473, 2019. doi:10.1109/ACCESS.2019.2918851.
JBYA19: Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, and Paavo Alku. GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-Spectrogram. In Proc. Interspeech 2019, 694–698. 2019. doi:10.21437/Interspeech.2019-2008.
KL62: J. L. Kelly and C. C. Lochbaum. Speech synthesis. In Proceedings of the Fourth International Congress on Acoustics, 1–4. Copenhagen, September 1962.
KSLB18: Jong Wook Kim, Justin Salamon, Peter Li, and Juan Pablo Bello. Crepe: a convolutional representation for pitch estimation. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 161–165. IEEE, 2018.
LJG23: Yunyi Liu, Craig Jin, and David Gunawan. Ddsp-sfx: acoustically-guided sound effects generation with differentiable digital signal processing. arXiv preprint arXiv:2309.08060, 2023.
Mak75: John Makhoul. Linear prediction: a tutorial review. Proceedings of the IEEE, 63(4):561–580, 1975.
MS23: Naotake Masuda and Daisuke Saito. Improving Semi-Supervised Differentiable Synthesizer Sound Matching for Practical Applications. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:863–875, 2023. doi:10.1109/TASLP.2023.3237161.
Ner23: Shahan Nercessian. Differentiable WORLD Synthesizer-Based Neural Vocoder With Application To End-To-End Audio Style Transfer. In Audio Engineering Society Convention 154. May 2023. URL: https://www.aes.org/e-lib/browse.cfm?elib=22073 (visited on 2023-06-21).
NSW21: Shahan Nercessian, Andy Sarroff, and Kurt James Werner. Lightweight and interpretable neural modeling of an audio distortion effect using hyperconditioned differentiable biquads. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 890–894. IEEE, 2021.
PPCS21: Jordi Pons, Santiago Pascual, Giulio Cengarle, and Joan Serrà. Upsampling artifacts in neural audio synthesis. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), volume, 3005–3009. 2021. doi:10.1109/ICASSP39728.2021.9414913.
RMR22: Lenny Renault, Rémi Mignot, and Axel Roebel. Differentiable Piano Model for Midi-to-Audio Performance Synthesis. In Proceedings of the 25th International Conference on Digital Audio Effects, 8. Vienna, Austria, 2022.
SFRK+23: Kilian Schulze-Forster, Gaël Richard, Liam Kelley, Clement S. J. Doire, and Roland Badeau. Unsupervised music source separation using differentiable parametric source models. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31():1276–1289, 2023. doi:10.1109/TASLP.2023.3252272.
SS90: Xavier Serra and Julius Smith. Spectral Modeling Synthesis: A Sound Analysis/Synthesis System Based on a Deterministic Plus Stochastic Decomposition. Computer Music Journal, 14(4):12–24, 1990. URL: www.jstor.org/stable/3680788 (visited on 2019-12-21), doi:10.2307/3680788.
SHC+22: Siyuan Shan, Lamtharn Hantrakul, Jitong Chen, Matt Avent, and David Trevelyan. Differentiable Wavetable Synthesis. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4598–4602. May 2022. ISSN: 2379-190X. URL: http://arxiv.org/abs/2111.10003 (visited on 2022-03-12), doi:10.1109/ICASSP43922.2022.9746940.
SCR+23: Jordie Shier, Franco Caspe, Andrew Robertson, Mark Sandler, Charalampos Saitis, and Andrew McPherson. Differentiable modelling of percussive audio with transient and spectral synthesis. In Proceedings of the 10th Convention of the European Acoustics Association Forum Acusticum 2023. 2023.
Smi07: Julius O. Smith. Introduction to Digital Filters with Audio Applications. W3K Publishing, http://www.w3k.org/books/, 2007. ISBN 978-0-9745607-1-7.
SZL+23: Kun Song, Yongmao Zhang, Yi Lei, Jian Cong, Hanzhao Li, Lei Xie, Gang He, and Jinfeng Bai. DSPGAN: A Gan-Based Universal Vocoder for High-Fidelity TTS by Time-Frequency Domain Supervision from DSP. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5. June 2023. doi:10.1109/ICASSP49357.2023.10095105.
SudholtCXR23: David Südholt, Mateo Cámara, Zhiyuan Xu, and Joshua D. Reiss. Vocal tract area estimation by gradient descent. In Proceedings of the 26th International Conference on Digital Audio Effects. Copenhagen, Denmark, 2023.
TST+21: Joseph Turian, Jordie Shier, George Tzanetakis, Kirk McNally, and Max Henry. One Billion Audio Sounds from GPU-enabled Modular Synthesis. In Proceedings of the 23rd International Conference on Digital Audio Effects. 2021.
vdOLB+18: Aaron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, George van den Driessche, Edward Lockhart, Luis Cobo, Florian Stimberg, Norman Casagrande, Dominik Grewe, Seb Noury, Sander Dieleman, Erich Elsen, Nal Kalchbrenner, Heiga Zen, Alex Graves, Helen King, Tom Walters, Dan Belov, and Demis Hassabis. Parallel WaveNet: fast high-fidelity speech synthesis. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, 3918–3926. PMLR, 10–15 Jul 2018. URL: https://proceedings.mlr.press/v80/oord18a.html.
vdODZ+16: Aäron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alexander Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. Wavenet: a generative model for raw audio. In Arxiv. 2016. URL: https://arxiv.org/abs/1609.03499.
WTY19: Xin Wang, Shinji Takaki, and Junichi Yamagishi. Neural source-filter waveform models for statistical parametric speech synthesis. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28:402–415, 2019.
WY19: Xin Wang and Junichi Yamagishi. Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis. In 10th ISCA Workshop on Speech Synthesis (SSW 10), 1–6. ISCA, September 2019. doi:10.21437/SSW.2019-1.
WHY+22: Da-Yi Wu, Wen-Yi Hsiao, Fu-Rong Yang, Oscar Friedman, Warren Jackson, Scott Bruzenak, Yi-Wen Liu, and Yi-Hsuan Yang. DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation. In Proceedings of the 23rd International Society for Music Information Retrieval Conference, 76–83. 2022.
WMD+22: Yusong Wu, Ethan Manilow, Yi Deng, Rigel Swavely, Kyle Kastner, Tim Cooijmans, Aaron Courville, Cheng-Zhi Anna Huang, and Jesse Engel. MIDI-DDSP: Detailed control of musical performance via hierarchical modeling. In International Conference on Learning Representations. 2022. URL: https://openreview.net/forum?id=UseMOjWENv.
YXT+23: Zhen Ye, Wei Xue, Xu Tan, Qifeng Liu, and Yike Guo. NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound Synthesis based on Frequency Modulation. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 5869–5877. 2023. doi:10.24963/ijcai.2023/651.
YF23: Chin-Yun Yu and György Fazekas. Singing voice synthesis using differentiable lpc and glottal-flow-inspired wavetables. arXiv preprint arXiv:2306.17252, 2023.
BarahonaRiosC23: Adrián Barahona-Ríos and Tom Collins. NoiseBandNet: Controllable Time-Varying Neural Synthesis of Sound Effects Using Filterbanks. July 2023. doi:10.48550/arXiv.2307.08007.