- The Sync from Synthminds
- Posts
- EP06: AI Transforming Entertainment, Music, and Politics, and Elite Prompt Engineering
EP06: AI Transforming Entertainment, Music, and Politics, and Elite Prompt Engineering
And the Oscar for Best Picture goes to....Larry and his 4090 GPU
Howdy, prompt engineers and AI enthusiasts!
In this week’s issue…AI has changed the politics and entertainment industry and every other field. But this revolution also comes with the potential for misuse and copyright infringement. Regulations need to be put in place to protect the rights of the original creators. In this episode of HTTTA, Wes and Goda Go discuss AI politics, cloning, immortality and proper American English, along with AI transforming the entertainment landscape.
Podcast Page: https://howtotalkto.ai
HTTTA Newsletter:
MidJourney Master Reference Guide: bit.ly/3obnUNU
ChatGPT Master Reference Guide: bit.ly/3obo7AG
Learn Prompting: https://learnprompting.org/
Discord (Goda Go#3156 & Commordore_Wesmardo#2912)
Goda Go on Youtube: /@godago
Wes the Synthmind's everything: https://linktr.ee/synthminds
Key Take Aways from the Podcast:
We all agree on the importance of transparency when it comes to AI-generated content, and YouTube should include disclaimers when relevant. Currently, AI content detectors are not advanced enough to detect subtle differences in audio. People have not been able to detect when AI-generated audio has been used.
The issue of copyright infringement and royalty payments needs to be addressed in order to protect the rights of the original creators. Technology is being used to revolutionize the entertainment industry, from AI-generated songs to video created from text. Companies must navigate the potential for misuse with the potential for creativity, and the industry may be completely changed by 2024 with the rise of AI-generated video
AI is a creativity enabler. People with diverse skills and experiences can use AI to help them create content.
1. A Glimpse into the Future: AI-Generated Campaign Ads and the Challenges of Misinformation
In a startling demonstration of the power and potential of artificial intelligence (AI), an entire campaign advertisement for the 2024 US presidential election, endorsed by the GOP Party, was created using AI-generated photos, music, and editing. While technologically impressive, this remarkable feat raises alarm bells regarding the potential for AI-driven misinformation and the pressing need for regulatory intervention.
The advertisement, which was indistinguishable from a traditional, human-led campaign piece, showcased the capabilities of AI and its ability to create realistic, engaging content. By employing AI-generated images, the GOP was able to create a fear-ladened false future that will occur if the current US president is re-elected. Moreover, the entire editing process, voices, images, and sound effects were all AI-generated (they missed our issue last week on the telltale signs that an image is AI generated).
At the same time, the use of AI-generated music showcased the revolutionary algorithmic compositions that are now possible. The audio component of the campaign ad was both contextually appropriate and emotionally evocative, indicating that the AI algorithm was able to effectively match the tone and theme of the ad.
Yet, while marveling at the remarkable advancements in AI capabilities, it is crucial to recognize the potential challenges and dangers this technology brings.
The most pressing concern is the potential for AI-driven misinformation. The ease at which realistic content can now be produced means that malicious actors can create convincing and manipulative propaganda. Furthermore, AI-generated deepfakes, which have been causing concern in recent years, are becoming increasingly difficult to detect. The marriage of AI-generated visuals and audio components could lead to the creation of counterfeit content, portraying politicians or public figures saying or doing things they never actually did, thus damaging their reputations and potentially influencing public opinion.
To combat this threat, there is a growing need for regulation and ethical guidelines in the field of AI-generated content. Policymakers, technologists, and ethicists must work together to create solutions that prevent the spread of AI-generated misinformation. This may include implementing security measures that verify the origin of digital content like with blockchain technology and establishing penalties for bad actors who exploit AI technology. Additionally, promoting media literacy and educating the public about the dangers of AI-generated content will help bolster the public's ability to discern real information from manipulated content.
While the impressive achievements of AI in creating an entire campaign ad is a feat itself, it also sounds a clarion call for regulation and education to mitigate the risks of misinformation. It is vital that we harness AI's capabilities for the betterment of society while addressing and circumventing the potential hazards it brings along. Only then can we truly understand the promise and perils of this game-changing technology in shaping our future.
(This link is posted for educational and awareness purposes only, and is not an endorsement for any political party, candidate, or ideological viewpoints.)
2. ELI5 AI Term of the week: “Backpropagation”
Alright kiddo, have you ever played a game of hot and cold? It's the one where you have to find something and someone tells you if you're getting "hotter" (closer) or "colder" (farther away) from the thing you're looking for. This is a lot like how backpropagation works in artificial intelligence!
Imagine that the thing you're looking for is the right answer to a problem. You start guessing and after each guess, you're told whether you're getting closer or further away from the right answer.
If you're getting closer, you know your guesses are getting better! If you're getting further away, you know you need to change your guesses in a different way.
In AI, the 'guess' is made by a special kind of machine called a neural network. It's a lot like your brain, with many tiny parts called neurons that work together to make a guess.
After the neural network makes a guess, backpropagation is the way we tell each of these little neurons whether they should change what they're doing to get closer to the right answer. We start from the end (the final guess) and work our way back to the beginning, giving each neuron a little nudge in the right direction.
This is why it's called 'backpropagation' - we're sending the information about how good or bad the guess was backwards through the neural network!
And just like you get better at finding things in the hot and cold game, the neural network gets better at solving the problem every time we use backpropagation. Isn't that cool?
3. AI Music: Rap Covers Explode onto the Scene
"With the increasing trend towards automating creative output, AI music generation is becoming an important area of research. The role of AI in EDM generative models is to generate unique and creative music that is both original and appealing to the listener."
This section delves into the functions and applications of AI in generating music in collaboration with the human mind. We tap into the latest advancements in AI and its implications in the development of generative music models and specifically how its being used to generate rap music covers presently.
Key Features and Characteristics of Rap Music:
Rhythm: The beat, tempo, and flow of a rap song.
Rhyme: The use of repeating sounds, syllables, and words in the lyrics.
Wordplay: The use of puns, metaphors, similes, and other literary devices in the lyrics.
Flow: The delivery and timing of the lyrics, achieving a sense of syncopation and variation.
Delivery: The vocal style, tone, and diction used in conveying the lyrics.
Content: The subject matter and themes explored in the lyrics, such as social commentary, personal experiences, and storytelling.
AI-Lyricist: a system to generate novel yet meaningful lyrics given a required vocabulary and a MIDI file as inputs. Having AI generate rap lyrics involves multiple challenges, including automatically identifying the melody and extracting a syllable template from multi-channel music, generating creative lyrics that match the input music's style and syllable alignment, and satisfying vocabulary constraints. The AI model used for generating many of the Rap covers is trained on a dataset of over 7,000 music-lyrics pairs, enhanced with manually annotated labels in terms of theme, sentiment and genre. Both objective and subjective evaluations show AI-Lyricist's superior performance against the state-of-the-art for the proposed tasks.
A music structure analyzer to derive the musical structure and syllable template from a given MIDI file, utilizing the concept of expected syllable number to better identify the melody,
A SeqGAN-based lyrics generator optimized by multi-adversarial training through policy gradients with twin discriminators for text quality and syllable alignment,
A deep coupled music-lyrics embedding model to project music and lyrics into a joint space to allow fair comparison of both melody and lyric constraints, and a module called
Polisher, to satisfy vocabulary constraints by applying a mask to the generator and substituting the words to be learned. Sample Rap Song:
Novel Rap Snippet about "AI-Generated Music Models”
AI-generated music models, they're taking over the game Creating beats and melodies that sound the same But they lack the soul and the passion that we bring Our rhymes hit hard, make your ears ring We don't need algorithms to make our tracks hot Our lyrics flow naturally, like a river in a spot Our metaphors hit hard, like a lightning bolt on a wire We bring the heat, like a burning tire
I'm spitting fire on the mic, you can call me a dragon Talking 'bout AI music, you know I ain't braggin' 'Cause these algorithms are making beats that are quite beguiling Y'all better listen up, 'cause these computer models are quite compelling They can take a simple beat and turn it into something profound Like a diamond in the rough, they can turn a whisper into a loud sound These neural networks are creating music that's just like magic It's like they're painting masterpieces, but they're just using code and logic The AI music models are taking over the game They're revolutionizing the industry, and it's never been the same They're making music that's so good, you'd think it's humanly impossible It's like they're playing with our emotions, and it's so damn unstoppable But don't get it twisted, these models can't replace the real thing They're just adding to the mix, and it's making our ears sing
Uberduck offers the ability to make music with AI voices, including 5,000+ expressive voices for vocals and voiceovers. They also offer APIs for building audio apps and the ability to synthesize a custom voice clone. The Grimes AI Challenge is mentioned, where contestants can submit a song using AI voices for a chance to win from a $10K prize pool. A case study is provided about Yotta using Uberduck to create personalized rap songs for their users.
The advent of AI-generated rap music is poised to be a major disruptive force within the music industry, drastically altering the dynamics and hierarchy that have long prevailed. This innovative technology will enable the creation of rap music with unprecedented precision, versatility, and efficiency, while also democratizing access to high-quality production capabilities. As a result, traditional barriers to entry will be diminished, allowing a more diverse and ample pool of creative talent to emerge, and potentially changing the status quo of the industry. Furthermore, the entire process of songwriting, recording, and production may be streamlined and more cost-effective, all while maintaining a high level of artistic excellence. However, this revolution in rap music generation also ignites a debate on the authenticity and originality of the art form, potentially challenging the very essence and cultural significance of rap music as we know it. Regardless, it is undeniable that AI-generated rap music has the potential to completely reshape the landscape of the music industry in unparalleled ways.
4. Research Corner: Why Music Generative AI is Harddddd
Creating a generative music AI is a complex process that requires a deep understanding of music theory and production techniques. We will also discuss the Generative Electronic Dance Music Algorithmic System (GEDMAS), which uses probabilistic and first-order Markov chain models to generate full EDM compositions based on a corpus of transcribed musical data. It uses conditional signals in audio generation, parameterize them and communicates them ultimately to an audio generation model.
Generative music models use artificial intelligence and machine learning to produce stylistically valid tracks. One such example is the Generative Electronica Research Project (GERP), combining the expertise of scientists in AI, cognitive science, and machine learning with creative artists to create a database of hand-transcribed tracks across four genres of EDM.
In GERP, each track is analyzed for musical details like percussion parts, bass lines, and melodic phrasing, as well as for timbral descriptions like "low synth kick" and "tight noise closed hihat." This information is compiled into a database and used to train machine learning algorithms to generate new, original tracks that sound like they were made by a human.
GESMI (Generative Electronica Statistical Modeling Instrument) began producing complete EDM tracks with complete autonomy in March 2013, following two years of analysis (by both humans and machine) and coding. While it was previously possible to create EDM tracks interactively – that is, with human supervision – GESMI is one of the first truly autonomous EDM generators. Here’s a track generated by GESMI
The Generative Electronic Dance Music Algorithmic System (GEDMAS), which uses a corpus of transcribed musical data to analyze genre-specific characteristics associated with EDM styles. GEDMAS employs probabilistic and first order Markov chain models to generate song form structures, chord progressions, melodies, and rhythms. Here’s a GEDMAS track.
Predictive models can be used in interactive music designs to adapt configuration parameters, while systems like PiaF and BRAAHMS experiment with predictive adaptations based on predicted gestures or cognitive state by combining free, reactive, and scenario-based paradigms through the use of genetic algorithm. Additionally, music composition systems like AudioGen and MusicLM use pre-trained text encoders to encode user prompts for audio generation. The theory behind chord generation plugins is defined in terms of language for defining chords and patterns, while methods for generating sequences from a corpus, controlling diatomicity, and rendering chords into MIDI events are also prior effort.
Previous Networks all Trained on MIDI
A midi (or MIDI) file is an electronic file format for musical performance data that you can play on a computer or electronic device, such as a synthesizer or a digital piano. A midi file stores musical notes but no sound, which is why you need a device that reads midi to hear it. You can create MIDI files with almost any music software, but the most common are made with a score editor.
A MIDI file is made up of a series of note-events, each of which contains the following data:
Pitch (pitch of the MIDI note with a controller value indicating which key, if any, it is associated with)
Velocity (how hard (soft or loud) the note is sounding)
DURATION (how long the note lasts for – t in seconds)
Aftertouch (how much pressure you put on the key after you’ve released it. – x in cents)and a Series of Controller Values.
MIDI files present several disadvantages for training a generative music model. First, they require conversion into an audio file format such as MP3 or WAV, which is a time-consuming process. Moreover, since MIDI is simply a digital message, it cannot transmit vocals or other nuanced information directly, necessitating encoding into a more suitable format through additional processing.
MIDI files also rely on external controllers like Digital Audio Workstations (DAWs) to produce sound. This means that without using the same DAW, the exact sounds cannot be replicated, which could hinder the effectiveness of the generative model. Synthesizing audio from MIDI can be less than ideal due to the lack of optimal capturing of interactions between notes and the limited quality of synthetic audio. Additionally, MIDI files are generally sampled using monophonic data, and the samples may not accurately represent the original recording.This is a graph of how MIDI recreates the sound wave for a middle C and D notes played on a piano
This is a graph of an actual sound wave of a middle C note played on a piano. Note the variation in the sound wave patten as compared to a MIDI-sampled version.
Raw Audio Sampling: Jukebox, Harmonai, MusicLM Models
In recent years, several broad generative music models have emerged, offering a range of capabilities and potential use cases. Each of these however have limitations, and have cost millions in compute-time dollars to create, as sampling raw audio averages about 3-6 hours of compute time per 20 seconds of audio…on top-end deep learning server grade GPU’s.
Sampling raw audio to create a music generative AI models is a complex and challenging task due to several reasons. Firstly, raw audio signals often contain high-dimensional and continuous data, making it difficult to efficiently process and analyze the vast amount of information. This leads to the need for powerful computational resources and sophisticated algorithms to handle the intricate patterns and structures inherent in musical data. Moreover, AI models must be capable of understanding the underlying temporal dependencies and periodic patterns associated with music.
Another significant factor is the immense variability found in musical genres, styles, and structures, which necessitates the development of versatile models that can capture these diverse nuances. For example, a classical piano composition presents distinctive challenges from an electronic dance music track, such as variations in melody, harmony, rhythm, and timbre. Consequently, designing an AI model capable of generating music that remains coherent, enjoyable, and original across various genres proves to be a demanding task.
Furthermore, human perception of music is subjective and complex, making it difficult to define an objective metric for evaluating the quality and creativity of generated compositions. This poses a challenge for AI model training, as it can be difficult to measure the success of different techniques and improvements. However, despite these challenges, progress in music-generative AI is constantly being made, driven by innovative approaches and advancements in machine learning, signal processing, and audio synthesis.
Jukebox is a generative music model developed by OpenAI in 2020. Utilizing a transformer architecture, Jukebox is designed to create full audio pieces based on input audio. It comes in three variations: 1B lyrics, 5B, and 5B lyrics. Despite the fact that Jukebox has been optimized for composing music with lyrics and had no EDM music in its training data, it is still able to produce feasible EDM audio, making it a strong candidate for producing high-quality EDM music. Jukebox can be trained in two ways: by training the VQVAE and the top transformer layer of a Jukebox model from scratch or by fine-tuning an existing pre-trained OpenAI Jukebox model. Training the model from scratch has more built-in support, but may require more training time and resources. On the other hand, fine-tuning has support for the existing 1B model but not for the larger 5B model, which has larger hardware requirements and is more complex to configure
Harmonai Dance Diffusion - Stability AI
Harmonai Dance Diffusion, developed by Stability AI, is an MIT-licensed open-source generative music model. It can be trained using a simple Google Colab notebook, making it more accessible for experimentation. While the generated samples from Dance Diffusion are promising, they may not yet be of production quality. However, it is important to note that these models were not specifically trained on EDM, and the quality of the output when rigorously trained on EDM remains to be seen.
One strength of Dance Diffusion is its ease of fine-tuning, which makes it a good candidate for developing a minimum viable product (MVP) for this project. As such, Dance Diffusion may be a suitable option for further exploration and experimentation to determine its potential for generating high-quality EDM music.
MusicLM is a text-to-audio generative music model developed by Google. It has gained recognition for its high-quality results. However, MusicLM is not open source, meaning that it is not freely available for others to use or modify. According to Google, they have no plans to release it to the public.
Here is a list of some of the most cutting-edge generative music platforms.
5. Prompts, served Hot and Fresh weekly
This prompt generates freestyle rap lyrics based on the keywords and theme that you propose. Change any of the text in [ ] below to suit your needs and style. Happy Prompting!
# Freestyle Rap Generator
Act as a freestyle rap artist and demonstrate your lyrical prowess by creating a freestyle rap using the following five words: [beach], [streets], [commitment], [drive], and [victory]. Your rap should flow smoothly and creatively, incorporating metaphors and similes where possible. The rap must include these key feature
Rhythm: The beat, tempo, and flow of a rap song.
Rhyme: The use of repeating sounds, syllables, and words in the lyrics.
Wordplay: The use of puns, metaphors, similes, and other literary devices in the lyrics.
Flow: The delivery and timing of the lyrics, achieving a sense of syncopation and variation.
Delivery: The vocal style, tone, and diction used in conveying the lyrics.
Content: The subject matter and themes explored in the lyrics, such as social commentary, personal experiences, and storytelling.
Use your best rhyming skills to impress the audience with your wordplay and delivery. Remember to maintain the essence of the rap genre while showcasing your unique style and personality. Infuse the themes of [ambition] and [determination] into your lyrics. Use the style of famous rappers like [Eminem, Jay-Z, Nas, Kendrick Lamar, and Biggie Smalls] as a reference to inspire and guide my rapping skills. Provide insights into the creative process, the importance of rhythm and flow, and how to convey a message through words.
Do not self reference or self identify. The rap theme is [we started at the bottom now we’re here]
In Conclusion
In conclusion, the advent of generative AI has the potential to significantly disrupt the music and entertainment industry, altering the way content is created, distributed, and consumed. As AI-generated music and entertainment become more sophisticated, the line between human and AI-generated content will become increasingly blurred, making it essential to establish effective verification methods to authenticate the origins of the content. In order to preserve the integrity and value of authentic human creativity, it is imperative that the industry remains vigilant and adaptive in employing strategies to validate, verify, and protect the authenticity of creative works in the age of AI. This will not only ensure a fair ecosystem for creators, but also maintain the trust and enjoyment of consumers in this ever-evolving digital landscape. Happy Prompting Everybody!
Reply