Neural Networks and Music Synthesis using Sound Data Sets
DOI:
https://doi.org/10.14571/brajets.v15.nse2.141-152Keywords:
Neural networks, Machine Learning, Creative Processes, Dataset curation, Sound SynthesisAbstract
This article proposes a comparative study between two topological structures of Neural Networks – Recursive Neural Networks (RNN) and WaveNet – applied to sound synthesis and analysis of sound datasets. Based on these two specific systems, the state of the art of these technologies in the field of contemporary sound creation was evaluated so that it could be possible to identify technical limitations and aesthetic possibilities for applying these systems in musical contexts. The relevance of the research in implementing these models in the field of sound creation and the Brazilian context focuses on the critical study of the adequacy of machine learning techniques in synthesis and the aesthetic implications of this technology in composition practices. At the current research stage, we conclude that the application of these synthesis methods falls short of professional use since the sounds produced have a high noise index, have low resolution and hardly maintain compositional coherence over the time of the samples. We also emphasize that implementing these systems in the Brazilian context is problematic, as developing these models requires access to costly high-performance computational resources. Hence, we have identified that a possible alternative to this problem of access to adequate infrastructure is the subscription to processing services via the cloud – we emphasize, however, that they are part of a monopoly of technology companies located exclusively in the Global North.References
Aiva Technologies (2020). Aiva. Disponível em <https://www.aiva.ai/>. Acesso em: 25 de set. 2020.
Amoore, Louise (2020). Cloud Ethics: Algorithms and the Attributes of Ourselves and Others. Londres: Duke University Press.
Arik, S. O. et al (2017). Deep Voice: Real-time Neural Text-to-Speech. Disponível em: . Acesso em: 25 de set. 2020.
Broussard, Meredith (2018). Artificial Unintelligence: How Computers Misunderstand the World. Cambridge: The MIT Press.
Caillon, Antonie e ESLING, Philippe. Streamable Neural Audio Synthesis with Non-Causal Convolution. Disponível em : <https://arxiv.org/pdf/2204.07064.pdf>. Acesso em: 15 de junho de 2023.
Carr, Cj e Zukowski, Zack (2017). Generating Black Metal and Math Rock: Beyond Bach, Beethoven and Beatles. 31st Conference on Neural Information Processing System, NIPS. Disponível em: <https://arxiv.org/abs/1811.06633>. Acesso em: 27 de set. 2020.
_____________________(2018). Generating Albums with SampleRNN to Imitate Metal, Rock and Punk Bands. MUME. Disponível em: <https://arxiv.org/abs/1811.06633>. Acesso em: 27 de maio de 2021.
_____________________(2019). Curating Generative Raw Audio Music with D.O.M.E. MILC. Disponível em: <http://ceur-ws.org/Vol-2327/IUI19WS-MILC-3.pdf>. Acesso em: 27 de mai. 2021.
Dadabots (2019). Relentless Doppelganger. Dadabots YouTube Channel. Disponível em: < https://www.youtube.com/watch?v=MwtVkPKx3RA>. Acessado em 28 de ago. de 2021.
________(2021). Music Page. Dadabots. Disponível em: <https://dadabots.com/music.php>. Acessado em 28 de ago. de 2021.
Dhariwal, Prafulla, et. al (2020). Jukebox: A Generative Model of Music. OpenAI. Disponível em: <https://openai.com/blog/jukebox/>. Acesso em 28 de setembro de 2020.
Dvs Sound (2017). Hybrid Vehicle with a LOM Elektrosluch 3+-HQ reversed 001. Dvs Sound YouTube Channel. Disponível em: <https://www.youtube.com/watch?v=kz0eL_RmCQg&t=83s>. Acesso em: 25 de set. 2020.
Eck, Douglas (2016). Welcome do Magenta! Google AI. Disponível em <https://magenta.tensorflow.org/blog/2016/06/01/welcome-to-magenta/>. Acesso em: 25 de set. 2020.
Engel, Jesse, et al (2019). GANSynth: Adversarial Neural Audio Synthesis. Google AI. Disponível em: <https://openreview.net/forum?id=H1xQVn09FX>. Acesso em: 25 de set. 2020.
Engel, Jesse e Resnick, Cinjon, et al (2017). Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders. Google Research. Disponível em <https://research.google/pubs/pub46119/>. Acesso em: 25 de set. 2020.
Eubanks, Virginia (2018). Automating Inequality: How High-Tech Tools Profile, Police and Punish the Poor. Nova Iorque: St. Martin’s Press.
Facebook (2021). Pytorch. Disponível em: <https://pytorch.org.>. Acesso: 13 de ago. 2021.
Fedden, Leon (2017). Comparative Audio Analysis with WaveNet, MFCCs, UMAP, t-SNE and PCA. Medium. Disponível em: <https://medium.com/@LeonFedden/comparative-audio-analysis-with-wavenet-mfccs-umap-t-sne-and-pca-cb8237bfce2f>. Acesso em: 25 de jun. 2021.
Google a. (2021). TensorFlow 2. Disponível em: <https://tensorflow.org>. Acesso: 13 de ago. 2021.
Google b (2021). Deep Dream Generator. Google. Disponível em: <https://deepdreamgenerator.com>. Acesso em: 27 de ago. 2021.
Graves, A (2013). Generating Sequences with Recurrent Neural Networks. Disponível em: <https://arxiv.org/abs/1308.0850>. Acesso em: 27 de maio de 2021.
Gray, Mary L. e Suri, Siddharth (2019). Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass. Nova Iorque: Houghton Mifflin Harcourt Publishing Company, 2019.
Herdon, Holly (2021). Holly Plus. Never Heard Before Sound. Disponível em: <https://holly.plus>. Acesso em: 27 de ago. 2021.
Hiner, Karl (2019). Generating Music with WaveNet and SampleRNN. Disponível em: <https://karlhiner.com/music_generation/wavenet_and_samplernn/>. Acesso em: 27 de ago. 2021.
Hochreitter, S. e Schmidhuber, J (1997). Long Short-Term Memory. Neural computation, 9(8): 17351780.
Huang, Cheng-Zhi, et al (2018). Music Transformer: Generating Music with Long-Term Structure. Cornell University. Disponível em: <https://arxiv.org/abs/1809.04281>. Acesso em: 25 de set. 2020.
Lemos, Gabriel Francisco (2016). Binah. Disponível em <https://vimeo.com/358627864>. Acesso: 25 de ago. 2021.
Kalchbrenner, N. et al (2018). Efficient Neural Audio Synthesis. Disponível em: . Acesso em: 27 de maio de 2021.
Karpatchy, A (2015). The Unreasonable Effectiveness of Recurrent Neural Networks. Disponível em: <http://karpathy.github.io/2015/05/21/rnn-effectiveness/>. Acesso em: 27 de maio de 2021.
Maaten, Laurens van der; Hinton, Geoffrey (2008). Visualizing Data t-SNE. Journal of Machine Learning Research, Volume 9, p. 2579-2605.
Mehri, Soroush, Kumar, Kundan, Gulrajani, Ishaan, Kumar, Rithesh, Jain, Shubham, Sotelo, Jose, Courville, Aaron C., and Bengio, Yoshua (2016). Samplernn: An unconditional end-to-end neural audio generation model. CoRR, abs/1612.07837. Disponível em: <http://arxiv.org/abs/1612.07837>. Acessado em: 25 de set. 2020.
Melen, Christopher (2020). A Short History of Neural Synthesis. Manchester: Research Centres at the RNCM. Disponível em: <https://www.rncm.ac.uk/research/research-centres-rncm/prism/prism-blog/a-short-history-of-neural-synthesis/>. Acesso: 13 de ago. 2021.
Muntref (2020). AudioStellar. Muntref Centro de Arte y Ciencia. Disponível em: <https://audiostellar.xyz>. Acesso: 13 de ago. 2021.
Norvig, Peter e Russell, Stuart (2021). Artificial Intelligence a Modern Approach. 4a Edição. Pearson Editions.
Perceptron (2011). Redes Neurais Artificiais Blogspot. Disponível em: <http://redesneuraisartificiais.blogspot.com/2011/06/perceptron-uma-breve-explicacao.html>. Acesso: 13 de ago. 2021.
Salem, Sam (2021). Prism-SampleRNN. Github. Disponível em: <https://github.com/rncm-prism/prism-samplernn>. Acesso em: 28 de maio de 2021.
Schubert, Alexander (2021). Switching Worlds. Vorlke-Verlag. Disponível em: <https://www.wolke-verlag.de/wp-content/uploads/2021/02/SwitchingWorlds_DIGITAL_englisch_210222.pdf>. Acesso em: 19 de fev. 2021.
Schultz, D. V. (2021). StyleGAN2-ADA. GitHub. Disponível em: <https://github.com/dvschultz/stylegan2-ada>. Acesso em: 27 de ago. 2021.
Steyerl, Hito (2017). Duty Free Art: Art in the Age of Planetary Civil War. Nova Iorque: Verso.
Van Den Oord, Aäron e et al (2016). Wavenet: A Generative Model for Raw Audio. CoRR, abs/1609.03499. Disponível em: <http://arxiv.org/abs/1609.03499>. Acesso em: 19 de set. 2019.
Veen, Fjodor Van (2016). The Neural Network Zoo. The Asimov Institute. Disponível em: <https://www.asimovinstitute.org/neural-network-zoo/>. Acesso em 25 de jun. 2021.
Vickers, Ben e Allado Mcdowell, K. (orgs.) (2021). Atlas of Anomalous AI. Londres: Ignota Books.
Wikipedia (2021). Linear Regression. Disponível em: <https://en.wikipedia.org/wiki/Linear_regression>. Acesso em: 19 de fev. 2021.
Zhang, Jin (2008). Visualization for Information Retrieval. Berlim: Springer-Verlag.
Downloads
Published
Issue
Section
License
Copyright (c) 2022 Gabriel Francisco Lemos
This work is licensed under a Creative Commons Attribution 4.0 International License.
The BRAJETS follows the policy for Open Access Journals, provides immediate and free access to its content, following the principle that making scientific knowledge freely available to the public supports a greater global exchange of knowledge and provides more international democratization of knowledge. Therefore, no fees apply, whether for submission, evaluation, publication, viewing or downloading of articles. In this sense, the authors who publish in this journal agree with the following terms: A) The authors retain the copyright and grant the journal the right to first publication, with the work simultaneously licensed under the Creative Commons Attribution License (CC BY), allowing the sharing of the work with recognition of the authorship of the work and initial publication in this journal. B) Authors are authorized to distribute non-exclusively the version of the work published in this journal (eg, publish in the institutional and non-institutional repository, as well as a book chapter), with acknowledgment of authorship and initial publication in this journal. C) Authors are encouraged to publish and distribute their work online (eg, online repositories or on their personal page), as well as to increase the impact and citation of the published work.