MusicLM: Generating Music From Text 

MusicLM: Generating Music From Text 

“We introduce MusicLM, a model generating high-fidelity music from text descriptions such as “a calming violin melody backed by a distorted guitar riff”. MusicLM casts the process of conditional music generation as a hierarchical sequence-to-sequence modeling task, and it generates music at 24 kHz that remains consistent over several minutes. Our experiments show that MusicLM outperforms previous systems both in audio quality and adherence to the text description. Moreover, we demonstrate that MusicLM can be conditioned on both text and a melody in that it can transform whistled and hummed melodies according to the style described in a text caption. To support future research, we publicly release MusicCaps, a dataset composed of 5.5k music-text pairs, with rich text descriptions provided by human experts.”

To get a real idea of this model check the different examples : Project page

Audio Generation From Rich Captions :  The audio is generated by providing a sequence of text prompts. These influence how the model continues the semantic tokens derived from the previous caption.

Long Generation 

Story Mode  : The audio is generated by providing a sequence of text prompts. These influence how the model continues the semantic tokens derived from the previous caption.

Text and Melody Conditioning : By adding melody embeddings to the conditioning, we can generate music that respects the text prompt while following the provided melody.

Painting Caption Conditioning   (!!!)

Generation Diversity

Test the diversity of the generated samples while keeping constant the conditioning and/or the semantic tokens.

 

 

 

 

 

Internet Trend: All your databases are ours

The frontend  is becoming the backend to a new set of smart interfaces. 

The integration of advanced technologies such as AI acting as long polling to the corpus that is the internet is leading to a transformation of the traditional client-server relationship, with the internet becoming the computational backend to a new generation of smart interfaces.

These interfaces are increasing productivity and providing more meaningful results, moving beyond simple data and information retrieval to delivering cohesive thoughts and insights.

This shift can be observed everywhere.  an example Perplexity AI, which presents information in a more intuitive way, eliminating the need for multiple clicks and searches, similar to how Google’s “remove the click” tactic works.

Here i did a search for the TAM for over-the-air TV
https://www.perplexity.ai/?uuid=d9cd029c-b15b-4ebd-8248-f13bf4f91160

This shift is worth noting.

 

Welcome to the unreal world of synthetic media

Where belief is suspended and provenance is unknown.

Synthetic media is media created or modified by artificial intelligence and machine learning.

Far from a new topic here but one that has moved forward with major technology advances and tooling.

In a world where it is difficult to distinguish between real and artificially generated content, our ability to believe and trust is challenged.

 

When our primary methods of encoding and verifying information are no longer reliable, it can be difficult to determine provenance and the accuracy of the information we encounter. This raises important questions about the impact of synthetic media on society and the potential risks and consequences of this technology.

Synthetic media  is set to impact the way we create and consume media, making it more efficient and accessible. It can be used for entertainment, education, and communication, it more likely be used for advertising, disinformation and information warfare.

The lack of control over the content that is created and the potential for it to be used for nefarious purposes, such as creating deep fakes for the purpose of deception or sabotage is cannon. It raises questions about authenticity and provenance, as it can be difficult to distinguish between real and artificially generated content.

The societal implications of synthetic media are significant and complex.  It will depend on how it is used and regulated. It is important to consider ways to mitigate the risks of synthetic media while maximizing the benefits. Regulation is unlikely and weaponization is more than likely. Action to follow.

Metrics for long term success

I spent a long time setting up engagement as the key performance indicator in social media. Advocating for the importance of engagement, publishing papers and creating tools to make the case.  It as wrong to prioritize engagement above all else.

Net positive impact matters more in the long run. While engagement may have been a useful metric in the past, it is not the most important factor in achieving long-term success.  News media companies and social media products should focus away from engagement and towards the real-world impact of their offerings and product.

Specifically news and media sectors, social products and networks. The goal should be to create impact rather than just generating clicks or short term viewership.

By prioritizing  impact and working towards long-term success to a more meaningful difference is the right path.  In the end companies and products will need to operate short term in active users + engagement metrics but at the same time orient towards impact.

Closing out the year…

Sunday Jan 1 2023 is Public Domain Day.  Works from 1927 fall into public domain.   Metropolis,  The Unknown (Chaney), The Jazz Singer.

Awful AI is a curated list to track current scary usages of AI

Tim Cook and Japanese PM Kishida Discussed User Privacy and Digital ‘My Number’ IDs.
Top million most popular websites published by Google Cached Chrome Top Million Websites

Long reads

Why Would AI “Aim” To Defeat Humanity?
​​The Alt-Right Manipulated My Comic. Then A.I. Claimed It
Canary tokens  

 

Some crypto Transaction Details are jaw dropping.
This view of hacker news 
PEN testing will be in the news soon thanks to FlipperZero the zero is for zero-day 🙂
GPT takes the bar exam and passes in Evidence and Torts  A good thread on Stable Diffusion’s progress this last year.

2022 was the year of the prompt for me.  I have moved to terminal as my main interface or with parity with web view and prompts and changing them together has become my main way to retrieve information. 

@JudyWoodruff⁩ ends her time at the ⁦@NewsHour⁩ anchor – thank you.