Neural Networks and Music Synthesis using Sound Data Sets

Authors

  • Gabriel Francisco Lemos Universidade de São Paulo, USP

DOI:

https://doi.org/10.14571/brajets.v15.nse2.141-152

Keywords:

Neural networks, Machine Learning, Creative Processes, Dataset curation, Sound Synthesis

Abstract

This article proposes a comparative study between two topological structures of Neural Networks – Recursive Neural Networks (RNN) and WaveNet – applied to sound synthesis and analysis of sound datasets. Based on these two specific systems, the state of the art of these technologies in the field of contemporary sound creation was evaluated so that it could be possible to identify technical limitations and aesthetic possibilities for applying these systems in musical contexts. The relevance of the research in implementing these models in the field of sound creation and the Brazilian context focuses on the critical study of the adequacy of machine learning techniques in synthesis and the aesthetic implications of this technology in composition practices. At the current research stage, we conclude that the application of these synthesis methods falls short of professional use since the sounds produced have a high noise index, have low resolution and hardly maintain compositional coherence over the time of the samples. We also emphasize that implementing these systems in the Brazilian context is problematic, as developing these models requires access to costly high-performance computational resources. Hence, we have identified that a possible alternative to this problem of access to adequate infrastructure is the subscription to processing services via the cloud – we emphasize, however, that they are part of a monopoly of technology companies located exclusively in the Global North.

Author Biography

Gabriel Francisco Lemos, Universidade de São Paulo, USP

Research Center on Sonology, University of São Paulo, USPGroup on Artificial Intelligence and Art, GAIA-InovaUSP  

References

Aiva Technologies (2020). Aiva. Disponível em <https://www.aiva.ai/>. Acesso em: 25 de set. 2020.

Amoore, Louise (2020). Cloud Ethics: Algorithms and the Attributes of Ourselves and Others. Londres: Duke University Press.

Arik, S. O. et al (2017). Deep Voice: Real-time Neural Text-to-Speech. Disponível em: . Acesso em: 25 de set. 2020.

Broussard, Meredith (2018). Artificial Unintelligence: How Computers Misunderstand the World. Cambridge: The MIT Press.

Caillon, Antonie e ESLING, Philippe. Streamable Neural Audio Synthesis with Non-Causal Convolution. Disponível em : <https://arxiv.org/pdf/2204.07064.pdf>. Acesso em: 15 de junho de 2023.

Carr, Cj e Zukowski, Zack (2017). Generating Black Metal and Math Rock: Beyond Bach, Beethoven and Beatles. 31st Conference on Neural Information Processing System, NIPS. Disponível em: <https://arxiv.org/abs/1811.06633>. Acesso em: 27 de set. 2020.

_____________________(2018). Generating Albums with SampleRNN to Imitate Metal, Rock and Punk Bands. MUME. Disponível em: <https://arxiv.org/abs/1811.06633>. Acesso em: 27 de maio de 2021.

_____________________(2019). Curating Generative Raw Audio Music with D.O.M.E. MILC. Disponível em: <http://ceur-ws.org/Vol-2327/IUI19WS-MILC-3.pdf>. Acesso em: 27 de mai. 2021.

Dadabots (2019). Relentless Doppelganger. Dadabots YouTube Channel. Disponível em: < https://www.youtube.com/watch?v=MwtVkPKx3RA>. Acessado em 28 de ago. de 2021.

________(2021). Music Page. Dadabots. Disponível em: <https://dadabots.com/music.php>. Acessado em 28 de ago. de 2021.

Dhariwal, Prafulla, et. al (2020). Jukebox: A Generative Model of Music. OpenAI. Disponível em: <https://openai.com/blog/jukebox/>. Acesso em 28 de setembro de 2020.

Dvs Sound (2017). Hybrid Vehicle with a LOM Elektrosluch 3+-HQ reversed 001. Dvs Sound YouTube Channel. Disponível em: <https://www.youtube.com/watch?v=kz0eL_RmCQg&t=83s>. Acesso em: 25 de set. 2020.

Eck, Douglas (2016). Welcome do Magenta! Google AI. Disponível em <https://magenta.tensorflow.org/blog/2016/06/01/welcome-to-magenta/>. Acesso em: 25 de set. 2020.

Engel, Jesse, et al (2019). GANSynth: Adversarial Neural Audio Synthesis. Google AI. Disponível em: <https://openreview.net/forum?id=H1xQVn09FX>. Acesso em: 25 de set. 2020.

Engel, Jesse e Resnick, Cinjon, et al (2017). Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders. Google Research. Disponível em <https://research.google/pubs/pub46119/>. Acesso em: 25 de set. 2020.

Eubanks, Virginia (2018). Automating Inequality: How High-Tech Tools Profile, Police and Punish the Poor. Nova Iorque: St. Martin’s Press.

Facebook (2021). Pytorch. Disponível em: <https://pytorch.org.>. Acesso: 13 de ago. 2021.

Fedden, Leon (2017). Comparative Audio Analysis with WaveNet, MFCCs, UMAP, t-SNE and PCA. Medium. Disponível em: <https://medium.com/@LeonFedden/comparative-audio-analysis-with-wavenet-mfccs-umap-t-sne-and-pca-cb8237bfce2f>. Acesso em: 25 de jun. 2021.

Google a. (2021). TensorFlow 2. Disponível em: <https://tensorflow.org>. Acesso: 13 de ago. 2021.

Google b (2021). Deep Dream Generator. Google. Disponível em: <https://deepdreamgenerator.com>. Acesso em: 27 de ago. 2021.

Graves, A (2013). Generating Sequences with Recurrent Neural Networks. Disponível em: <https://arxiv.org/abs/1308.0850>. Acesso em: 27 de maio de 2021.

Gray, Mary L. e Suri, Siddharth (2019). Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass. Nova Iorque: Houghton Mifflin Harcourt Publishing Company, 2019.

Herdon, Holly (2021). Holly Plus. Never Heard Before Sound. Disponível em: <https://holly.plus>. Acesso em: 27 de ago. 2021.

Hiner, Karl (2019). Generating Music with WaveNet and SampleRNN. Disponível em: <https://karlhiner.com/music_generation/wavenet_and_samplernn/>. Acesso em: 27 de ago. 2021.

Hochreitter, S. e Schmidhuber, J (1997). Long Short-Term Memory. Neural computation, 9(8): 17351780.

Huang, Cheng-Zhi, et al (2018). Music Transformer: Generating Music with Long-Term Structure. Cornell University. Disponível em: <https://arxiv.org/abs/1809.04281>. Acesso em: 25 de set. 2020.

Lemos, Gabriel Francisco (2016). Binah. Disponível em <https://vimeo.com/358627864>. Acesso: 25 de ago. 2021.

Kalchbrenner, N. et al (2018). Efficient Neural Audio Synthesis. Disponível em: . Acesso em: 27 de maio de 2021.

Karpatchy, A (2015). The Unreasonable Effectiveness of Recurrent Neural Networks. Disponível em: <http://karpathy.github.io/2015/05/21/rnn-effectiveness/>. Acesso em: 27 de maio de 2021.

Maaten, Laurens van der; Hinton, Geoffrey (2008). Visualizing Data t-SNE. Journal of Machine Learning Research, Volume 9, p. 2579-2605.

Mehri, Soroush, Kumar, Kundan, Gulrajani, Ishaan, Kumar, Rithesh, Jain, Shubham, Sotelo, Jose, Courville, Aaron C., and Bengio, Yoshua (2016). Samplernn: An unconditional end-to-end neural audio generation model. CoRR, abs/1612.07837. Disponível em: <http://arxiv.org/abs/1612.07837>. Acessado em: 25 de set. 2020.

Melen, Christopher (2020). A Short History of Neural Synthesis. Manchester: Research Centres at the RNCM. Disponível em: <https://www.rncm.ac.uk/research/research-centres-rncm/prism/prism-blog/a-short-history-of-neural-synthesis/>. Acesso: 13 de ago. 2021.

Muntref (2020). AudioStellar. Muntref Centro de Arte y Ciencia. Disponível em: <https://audiostellar.xyz>. Acesso: 13 de ago. 2021.

Norvig, Peter e Russell, Stuart (2021). Artificial Intelligence a Modern Approach. 4a Edição. Pearson Editions.

Perceptron (2011). Redes Neurais Artificiais Blogspot. Disponível em: <http://redesneuraisartificiais.blogspot.com/2011/06/perceptron-uma-breve-explicacao.html>. Acesso: 13 de ago. 2021.

Salem, Sam (2021). Prism-SampleRNN. Github. Disponível em: <https://github.com/rncm-prism/prism-samplernn>. Acesso em: 28 de maio de 2021.

Schubert, Alexander (2021). Switching Worlds. Vorlke-Verlag. Disponível em: <https://www.wolke-verlag.de/wp-content/uploads/2021/02/SwitchingWorlds_DIGITAL_englisch_210222.pdf>. Acesso em: 19 de fev. 2021.

Schultz, D. V. (2021). StyleGAN2-ADA. GitHub. Disponível em: <https://github.com/dvschultz/stylegan2-ada>. Acesso em: 27 de ago. 2021.

Steyerl, Hito (2017). Duty Free Art: Art in the Age of Planetary Civil War. Nova Iorque: Verso.

Van Den Oord, Aäron e et al (2016). Wavenet: A Generative Model for Raw Audio. CoRR, abs/1609.03499. Disponível em: <http://arxiv.org/abs/1609.03499>. Acesso em: 19 de set. 2019.

Veen, Fjodor Van (2016). The Neural Network Zoo. The Asimov Institute. Disponível em: <https://www.asimovinstitute.org/neural-network-zoo/>. Acesso em 25 de jun. 2021.

Vickers, Ben e Allado Mcdowell, K. (orgs.) (2021). Atlas of Anomalous AI. Londres: Ignota Books.

Wikipedia (2021). Linear Regression. Disponível em: <https://en.wikipedia.org/wiki/Linear_regression>. Acesso em: 19 de fev. 2021.

Zhang, Jin (2008). Visualization for Information Retrieval. Berlim: Springer-Verlag.

Published

2022-12-22

Issue

Section

Article